MAXIMAL MULTILINEAR OPERATORS 
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Abstract. We establish multilinear LP bounds for a class of maximal multilinear av- 
erages of functions on one variable, reproving and generalizing the bilinear maximal 
function bounds of Lacey [13]. As an application we obtain almost everywhere conver- 
gence results for these averages, and in some cases we also obtain almost everywhere 
convergence for their ergodic counterparts on a dynamical system. 



1. Introduction 

in— 1 m 



Let n > 1, m > 1 and consider an (n — 1) x m real-valued matrix A = (cijj)" =1 ™_ v 
This naturally gives rise to the multilinear averages: 

^ „ n—1 m 

T A ,R,r(/l, • • • , fn-l)(x) := j— — / I] fi(x + ^ Oijtj)dt, (1) 

W J\ tl \,...,\t m \<r i=1 j=1 

where r > and fi, ■ ■ ■ , f n -i are arbitrary measurable functions on R. Part of the 
motivation for considering such averages comes from ergodic theory. Let X = (X, Ti,m,S) 
be a dynamical system, i.e. a complete probability space (X, S,m) endowed with an 
invertible bimeasurable transformation S : X — > X such that mS' 1 = m. We define the 
iterates S n : X — >• X for n G Z in the usual manner. In case the matrix A has integer 
entries, one can consider the following ergodic averages: 

1 n—1 

7W/d • • • , /n-i)(x) := £ n U* >y; " ■'■<■)■ (2) 

1 J \h\,...,\l m \<L i=l 

We use L P (R) to denote the usual Lebesgue spaces on R, and L P (X) to denote the 
Lebesgue spaces on the dynamical system X. 

In this paper we shall be primarily concerned with the problem of almost everywhere 
convergence of these averages as r — > or L —>■ oo in the case that the /, obey some 
L Pi type integrability condition. As it is well known, such problems are related to the 
boundedness properties of the maximal operator 



t Ir(/i> • • • > fn-i)(x) := sup \T A;n>r (f u ... , / n _i)(:r)| 



r>0 



j „ n—1 m 

r>0 {^r) J\t 1 \,...,\t m \<rjJ[ ~[ 



(3) 



l 
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or the closely related maximal operator 



T l,x(A>--- Jn-i)(x) :=sup|7a,x,l(/i,-. - ,/„-i)(a;)| 

L>0 

-s'prriF £ nV^>i. (4> 

^ >U V ; \h\,...,\l m \<Li=l 

It turns out that standard transference arguments allow one to convert any positive or 
negative boundedness result for T| R to one for T^ x and vice versa; see Proposition 14.1. 
Thus we shall view the boundedness problems for these two maximal operators as being 
equivalent. 

Since one can easily establish convergence for (1) in any reasonable topology when the 
fi, ■ ■ ■ , fn-i are smooth, compactly supported functions, a standard density argument 
then shows that as soon as the maximal operator T^ R maps L Pl (R) x . . . x L Pn_1 (R) to 
weak L q (TV) for some < q < oo, then the averages (1) will converge pointwise almost 
everywhere when fi G L Pl (R) for 1 < % < n — 1, at least in the case when all the 
Pi, . . . ,p n -i are finite 1 . In fact these averages will converge almost everywhere to the 
pointwise product fi . . . f n -i- In the converse direction, Stein's maximal principle [19] 
shows that in many cases, almost everywhere convergence of (1) can only be established 
via such weak L q bounds on the maximal operator T* A R . 

For the ergodic averages (2), the situation is more difficult because there is no obvi- 
ous counterpart of the class C£°(R) of smooth compactly supported functions on which 
the convergence is easy to establish 2 . However, one can use the class L°°(X) as a sub- 
stitute, in the sense that once almost everywhere convergence for Xa,x,l is established 
for fi, . . . , f n -i G L°°(X), one can extend this convergence result to the case when 
fi G L Pl (X) provided that one knows that the maximal operator T| R maps L Pl (R) x 
. . . x L^- 1 (TV) to weak L q (R) for some < q < oo, since transference arguments then 
give an analogous boundedness statement for T^ x . Thus the problem of almost every- 
where convergence of Ta,x,l for functions fi G L Pl (X) factors into two rather distinct 
problems, namely establishing convergence for L°°(X) functions (which is a problem in 
ergodic theory), and establishing a bound for T\ (which is a problem in multilinear har- 
monic analysis). In this paper we shall focus almost exclusively on the latter problem. 
The former problem is quite difficult, except when n = 2; the n = 3 case already requires 
a deep result of Bourgain [5], and convergence for higher n is only proven for very special 
averages (see e.g. [1]) or with additional spectral assumptions on the shift S (see [2], [14], 
[15]), and we will not make progress on these issues here. 

Let us now discuss some important special cases of the above general setup. 



When one or more of the exponents is oo one can proceed by localization arguments, exploiting the 
fact that an L°° function is locally in L p for any p < oo. This costs us an epsilon in the exponents but 
in most of our results the range of exponents will be open and so this will not make any difference. 

2 An alternate approach would be to establish either a V q variational estimate on Ta,x.,l in L for some 
q < oo, or an oscillation inequality, since any of these automatically implies convergence as L — > oo, in 
the spirit of Doob's inequality or Lepingle's inequality. We will not pursue such an approach here, but 
see for instance [4], [5]. 
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1.1. Linear averages. If n — 2, m — 1, and A = (a) for some non-zero integer a, then 
we have 



T A ,n,rfi(x) = ^J h(x + at) dt 



and 

Ta y t. fi (x) = — 



l=-L 

If fi is in L P (R) (resp. L P (X)) for some 1 < p < oo, then the Lebesgue differentiation 
theorem (resp. the Birkhoff ergodic theorem) shows that TA,n,rfi (resp. Ta,sl,l) are 
almost everywhere convergent. Both of these results require the Hardy-Littlewood maximal 
inequality, which asserts that the Hardy-Littlewood maximal operator 



MA (:r) :=supi / |/i|(a; + t) dt 
r>0 r J_ r 



maps L 1 to weak L 1 . The Lebesgue differentiation theorem follows immediately from the 
maximal inequality, whereas the Birkhoff ergodic theorem requires that one first establish 
almost everywhere convergence for a dense class such as L°°(X). 

1.2. Bilinear averages. Let n = 3, m = 1, and A = ( ai ) for some distinct non-zero 

\a 2 J 

integers oi, a 2 , thus 

/2)W = ^ J fi(x + a^Ux + a 2 t) dt 

and 

1 L 

r w (/i,/ 2 )(x) = ^— E fii^M^x). 

l=-L 

As a consequence of a deep theorem of Bourgain [5] (relying on Fourier analysis on the 
torus), it is known that the averages 7a,x,l(/i, fz) converge almost everywhere when- 
ever /i,/2 £ L°°(X). Almost everywhere convergence in other classes then pivots on 
understanding the bilinear maximal operator 

1 r 

T A,n(fu f2){x) = sup |— / fx(x + a 1 t)f 2 (x + a 2 t) dt\. 

This operator clearly maps L°°(R) x L°°(R) — > L°°(R), and from the Hardy-Littlewood 
maximal inequality it also maps L°°(R) xL 1 (R) or L 1 (R) xL°°(R) to weak L 1 . This, com- 
bined with bilinear interpolation, is enough to establish almost everywhere convergence of 
the ergodic averages Ta,x,l for j\ G L Pl (X), f 2 G L P2 (X) when l/p± + l/p 2 < 1 (one also 
obtains the edge 1/pi + l/p 2 = 1 from this argument as long as pi,p 2 < oo). It was shown 
by Lacey [13], using time-frequency analysis, that T^ R in fact maps L Pl (R) x L P2 (R) to 
L q (R) whenever ± = ±- + -±- and q > |. This allows one to extend the almost everywhere 
convergence result to the larger range 1/pi + l/p 2 < 3/2. It is an interesting question as 
to whether this is the true limit for these results. Certainly one has boundedness for a 
single-scale operator T^ iR j . or Ta,x,l all the way up to the range 1/pi + l/p 2 < 2. On the 
other hand, the time- frequency approach is known to break down at 1/pi + l/p 2 = 3/2 
(see [13]). 



4 



CIPRIAN DEMETER, TERENCE TAO, AND CHRISTOPH THIELE 



1.3. Furstenberg averages. Let n > 2, m — 1, and let A be the matrix 



A 



\n-l J 



Then (1) becomes the multilinear average 

7A,R,r(/l, • • • , fn-l)(x) = — J JJ fc(x + it) dt 

and (2) becomes the Furstenberg average 

1 L 

• • • , f n -l)(x) = E Z*^)" 



Note the cases n = 2, n = 3 are special cases of the linear and bilinear averages considered 
earlier. These averages are related to the Furstenberg recurrence theorem [10] and to 
Szemeredi's theorem on arithmetic progressions [20], and are also connected to the recent 
result in [11] that the primes contain arbitrarily long progressions. For instance, the 
Furstenberg recurrence theorem is essentially the assertion that 



liminf / T a , x ,l(/, • • • ,f)fdm>0 
L ^°° Jx 



whenever / is non-negative and does not vanish almost everywhere. The question of norm 
convergence of 7a,x,l is more difficult and has only been recently treated in the indepen- 
dent works of Host and Kra [12] and Ziegler [21]. They showed that if fi, ■ ■ ■ , f n -i £ 
L°°(X) then 7a,x,l(/i, • • • , fn-i) converges in L 2 (X) norm (and hence in L P (X) norm for 
any 1 < p < oo). Their approach relies on the reduction to convergence for functions in a 
sub-a-algebra Z n _ 1 of E, known as a characteristic factor, on which T can be represented 
as an inverse limit of translations on nilmanifolds. The advantage of such a concrete 
representation is that this particular type of translations is quite well understood. In par- 
ticular, Zq is the a-algebra spanned by the invariant sets of powers of T, while the action 
of T on the Kronecker factor Z\ is isomorphic with a rotation on some abelian group. 
The a-algebras with k > 2 give rise to noncommutative factors which require a more 
delicate analysis. The work in this paper will however proceed in a different direction, fo- 
cusing on the quantitative bounds of various operators associated to these averages rather 
than analyzing characteristic factors. It is of course possible to extend these norm conver- 
gence results to functions /j in other spaces L Pi (X) by exploiting boundedness properties 
Ta,x,l or T4 R r , but we will not pursue this issue here, though we will mention that some 
surprising subtleties in this problem in the case l/p 1 + . . . + l/p n > 1 have been uncovered 
by Christ [6]. 

The problem of almost everywhere convergence, as opposed to norm convergence, for 
the Furstenberg averages remains open even for n = 4. One can obtain some bounds 
of the corresponding maximal operators in L p spaces by leveraging the corresponding 
bounds in the bilinear setting. For instance one can extend Lacey's bilinear estimates 
mentioned earlier to the multilinear setting by estimating all but two of the functions in 
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This ultimately leads to a bound on T| R from L Pl (R) x ... x i>(R) to L q (R) 
whenever 1 < p±, . . . ,p n < oo and 1/q — 1/pi + . . . + l/p n < 3/2. 

1.4. Averages along cubes. The work of Host and Kra [12] related the norm conver- 
gence of the above Furstenberg averages to the norm convergence of averages of cubes, 
which is a special case of (2) with n = 2 m . To define them, let V m be the index set 

Vm 



{0, l} m \ {0} m . The averages on the m-dimensional cubes are 

1 



(2L + 1)' 



e n f^ e x). 



(5) 



?G{-L,...,L} m eG^m 

For example, when m = 1 (so n = 2) we just have a linear averaging operator. When 
m = 2 (and so n = 4), this averaging operator along squares is essentially the same as 
7a,x,l with 

' 1 
A | 1 
1 1 



while when m 
with 



3 (and n — 8) the averaging operator along cubes is essentially Tajl,l 

\ 



A : = 



/° 




1 
1 
1 

V i 



i 



1 



1 



1 / 



It is proved in [12] that the averages 3 in (5) have Z m _i as a characteristic factor for L 2 - 
norm convergence, and as a consequence that these averages converge in £ 2 (X) whenever 
f e G L°°(X). Using these characteristic factors, Assani [1] showed that these averages also 
converged pointwise almost everywhere when f e e L°°(X). It is somehow peculiar that 
these techniques do not seem to be able to give an alternative (non-Fourier analytical) 
proof to Bourgain's pointwise result mentioned earlier. 

To extend the latter L°°(X) convergence result to an L P (X) convergence result requires 
control of a maximal function. For sake of concreteness let us just focus on the case 
m — 2, where the relevant maximal function is 

j nr rr 

su pIt7T7^ / / fw(x + t 1 )f 01 (x + t 2 )fu(x + t 1 + t 2 ) dt^y 



r>0 



(2rf 



One can deduce a certain number of bounds on this maximal function from the Hardy- 
Littlewood maximal inequality and multilinear interpolation. Indeed, the maximal in- 
equality and Holder's inequality implies that this maximal function lies in weak L 1 / 2 
whenever two of /io,/oi,/n he in L l and the other one lies in while this maximal 
operator is trivially in L°° when all three of /io,/oi,/n he in L°°. Interpolation then 
gives bounds (and hence almost everywhere convergence of the associated averages along 
squares) when f 01 G L Po1 , f 10 G L Pw , f u G L Pl1 with l/p 01 + l/p 10 + l/p n < 2, with an 

3 Actually, a more general class of averages is shown in [12] to have Z m _i as a characteristic factor; we 
refer the reader to [12] for the details. 
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extension to the boundary 1/poi + 1/Pio + 1/Pn — 2 when all of the exponents are finite. 
As a corollary of our main result (which is proven using time-frequency techniques) we 
shall be able to extend this range to 1/poi + 1/pio + 1/pn < 5/2, in analogy with the 
situation for bilinear averages discussed earlier (see Corollaries 1.6, 1.7 below). 



1.5. Main results. We now study the maximal operator T* A R defined in (3) for a general 

we will allow the here to be non-integer as one can still 



1) x m matrix A 



"i,3 h 



define T* A R in this case. To state the main result we need some notation. We introduce 
the extended matrix E(A), which is the n x (m + 1) matrix 

/ 



E(A) := 



V 



0-1,1 


a 1>2 


a l,m 


A 


o 2 ,i 


0-2,2 


0-2,m 


l 




O n -l,2 • • 


Q>n—l,m 


l 











V 



Note that the range of this matrix consists of all n-tuples of the form 

m m 

( X + Y1 ai ^'*J' • • ' ' X + Yl a n-l,jtj,x) 
3=1 3=1 

for x, ti, . . . , t m e R. 

A set of row indices i is said to be a se£ o/ linear independence for a matrix _B, if the 
set of corresponding rows of B is linearly independent. Given a matrix A, let Sa,€ for 
< e < 1/4 be the set of all tuples (x 1: . . . ,x n _i) where x,i E {0, 1/2 + e, 1 — e} for all i, 
there is at most one index % with Xi = 1/2 + e, the indices % with Xi = 1 — e form a set of 
linear independence for A, and the indices i with Xi G {1/2 + e, 1 — e} form a set of linear 
independence for E(A). Let H\ t be the convex hull of Sa,€ an d let Ha be the union of 
all H A , t with < e < 1/4. 

The following is our main theorem: 

Theorem 1.1. Assume n > 3 and let A be a matrix as above. Let (p 1; . . . ,p n -i) be a 
tuple of real numbers with 



1 < Pi < oo 



/or 1 < i < n — 1 and set 



Pn 



n-l 

E 

i=i 



Pi 



(6) 



(7) 



then the operator T* A R 



(l/pi,...l/p n _i) G 



A,R 



: L P1 x • • • x L p " 



is bounded. 
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Remark 1.1. The condition (7) is mandated by scaling considerations (i.e. dimensional 
analysis). As we shall see shortly, the theorem is trivial if one restricts the tuples (1/p?) 
to the convex hull of those points in Sa,€ which do not have a component equal to 1/2 + e. 
This happens in particular when n = 2. Thus, in a nutshell, we are gaining 1/2 — e over 
the trivial estimates. 

Remark 1.2. For some matrices A we can obtain a better range of exponents than stated 
in the theorem. Namely, when the matrix A is a diagonal block matrix, we may gain 1/2 
for every block. More precisely, the argument works for A upper block triangular and 
E(A) modulo the last column and restricted to the rows other than the last row is block 
diagonal. The argument involves only separation of variables and Holder's inequality, so 
we shall not elaborate on this. 

The following corollary is weaker than the theorem, but has the advantage of an easy 
description of the range of exponents and covers many of the cases of interest. Define 
the nondegeneracy rank of the matrix A, denoted by rank* (A), to be the largest integer r 
such that any r rows of A are linearly independent. It is an immediate observation that 
rank* (A) + 1 > rank*(E(A)) > rank* (A). 

Corollary 1.2. Assume n > 2 and let A be a matrix as above. Define the complexity 
parameter k = n — rank*(E(A)). Let (pi, . . . ,p n -i) be a tuple with 

1 < Pi < oo 

for 1 < i < n — 1 and set 



If 



then the operator TjJ ■ 
is bounded. 

Proof The closure of the region of tuples (1/pi) in the corollary is the intersection of 
the cube [0, l] n_1 with a half space. All extremal points of this set are on an edge of the 
cube and thus have all but at most one coordinate in {0, 1}. The only possible value for 
the exceptional coordinate is 1/2 as the right- hand-side of (8) is equal to 1/2 modulo the 
integers. Thus the region in the corollary is the convex hull of all tuples (xi, . . . , X2) with 
at most n — k — 1 components equal to 1 — e, at most one component equal to 1/2 + e and 
the remaining components equal to 0. The corollary then follows from the rank conditions 
on A and E(A) and the fact that rank* (A) > n — k — 1. ■ 

Remark 1.3. As discussed earlier, the boundedness results in Theorem 1.1 and Corollary 
1.2 immediately imply almost everywhere convergence for T^r^/i, . . . , f n -i) as £ ^ 
when fi e L Pl (R) if all the pi are finite, since this convergence is trivial for fa in the 
dense class C£°(R). The p { = 00 cases can also be handled by a localization argument 



71-1 1 

— = y-. 

Pn j~f Pi 



1 1 ,1 

\ 1 < n - k - -, 

Pi Pn-l 2 



(8) 



t a,r ■ LPl x • • • x LPn ^ -»• LPn 
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and exploiting some open-ness properties of H A . The situation for the ergodic averages 
is however substantially more difficult. 

Remark 1.4. If rank*(E(A)) = rank(E(A)), then the regions described in Theorem 1.1 
and Corollary 1.2 are equal. 

Remark 1.5. It is worth noting that p' n can be less than 1, indeed it is less than 1 in all 
nontrivial cases. In some cases one can get below 1 by using just Holder's inequality and 
interpolation, see for instance the discussion in Section 1.4. 

Remark 1.6. Theorem 1.1 is a direct analog of the singular integral version in [16, Theorem 
1.1], which roughly speaking replaces T^ R with the related expression 




for some Calderon-Zygmund kernel K. As a consequence the methods of proof are quite 
similar. The parameter k in Corollary 1.2 plays the same role as the parameter k appearing 
in [16, Theorem 1.1], measuring the complexity of the averages under investigation. The 
case k = for the singular integral version can be solved with classical methods, namely 
Littlewood-Paley theory or wavelets, just as the case k = for the maximal version can 
be solved using the classical Hardy-Littlewood maximal inequality. 

Readers familiar with [16] will observe that the range of exponents in Theorem 1.1 is 
somewhat more permissive than that in [16]. More precisely, the restriction k < | as well 
as several restrictions on the exponents Pi from [16] are not needed in Theorem 1.1. This 
is a consequence of the fact that there are trivial reductions in the maximal operator case 
if there are exponents Pi = oo, while in the singular integral setting there are no such 
trivial reductions. This explains why for instance we can obtain nontrivial estimates for 
the trilinear maximal operator (n — 4, k — 2) 

n(fu / 2 , /s) := sup - / \h(x + ai t)f 2 (x + a 2 t)f 3 (x + a 3 t)\ dt (9) 

e>0 e J\t\< e 

with ai, a 2 , a 3 , pairwise different (see Example 1.5 below), despite the fact that no L p 
bounds of any sort are known for the trilinear Hilbert transform 

f ctt 
p.v. / fi(x + a 1 t)f 2 (x + a 2 t)f 3 (x + a 3 t) — . 

Jb, t 

Remark 1.7. It should be emphasized that the nontrivial estimates from the k > 1 cases 
are all obtained by such trivial reductions to the case k = 1 and multilinear interpolation. 
In other words, there is no special theory developed yet to address the case k > 2. It is 
quite probable that more sophisticated techniques will extend the range of the exponents 
in this case. An interesting connection concerns the fact that averages corresponding to 
some k > appear to have Z\. as a characteristic factor for L 2 -norm convergence. In 
particular, it is an exercise based on the techniques from [1] and from [5] to show that Z\. 
is the characteristic factor even for a.e. convergence, when k — 0, 1. This would support 
the evidence that, as in the case of norm convergence, k is the only parameter which 
dictates the complexity of the averages and of the techniques needed for the proof. 
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Similar difficulties are encountered when dealing with polynomial maximal operators 
such as P*(fi, fi){x) '■= sup e>0 ^ L t \ <e \h{ x + t)f2(x + t 2 ) \ dt. In all these instances, 
the decomposition of the maximal operator, as explained in third section below, gives 
rise to a summation over a larger family of multidimensional cubes, each of which is 
indexed by more than just one parameter. Curiously, the boundedness of the maximal 
operator associated with polynomial averages, unlike the multilinear averages studied here 
(see Proposition 14.1), does not in general transfer from harmonic analysis to ergodic 
theory. It is really that the results in these two contexts have different meaning and 
most probably distinct ideas behind their proofs. An illuminating contrast comes from 
the fact that sup e>0 \ Jj t i <e \f(x + t 2 )\dt can be easily bounded by the Hardy-Littlewood 
maximal function, while Bourgain showed that the convergence of the ergodic averages 
along squares needs completely new ideas [4]. 

Let us illustrate Theorem 1.1 and Corollary 1.2 with some examples. 

Example 1.3. Consider the bilinear averages from Section 1.2. Here the extended matrix 
is 

/«! 1' 

E(A) = \a 2 1 
V° h 

One can check that rank(A) = 1 and rank*(E(A)) = rank(E(A)) = 2, and 

S A , e = {(0, 0), (0, 1/2 + e), (1/2 + e, 0), (0, 1 - e), (1 - e, 0), (1/2 + e, 1 - e), (1 - e, 1/2 + e)} 

and hence 

H A = {(a, b) : < a, b < 1; a + b < 3/2}. 
In this case Theorem 1.1 and Corollary 1.2 give the same results, namely recovering 
the bilinear maximal function estimates of Lacey [13] described earlier. Indeed we give 
a reasonably self-contained? proof of the main results from [13] here, following Lacey's 
approach. 

Example 1.4. Consider the n = 4 Furstenberg average from Section 1.3. Here the 
extended matrix is 



E(A) := 



/ 1 1\ 

2 1 

3 1 

V° V 



One can check that rank(A) = 1 and rank*(E(A)) = rank(E(A)) = 2, and S A ,e consists of 
those triples (a, b, c) with a,b,c G {0, 1/2 + e, 1 — e}, at most one of a, b, c equal to 1/2 + e, 
and at most one of a, b, c equal to 1 — e. This gives 

H A = {(a, 6, c) : < a, 6, c < 1; a + b < 3/2}. 

In this case, Theorem 1.1 and Corollary 1.2 recover the multilinear estimates mentioned 
at the end of Section 1.3 that can be trivially obtained from Lacey's bilinear result. Similar 
considerations apply to higher values of n. 



4 We will require some results from other papers, notably the multilinear interpolation theory from 
[16], the weak Bessel inequality for forests (see e.g. [18]), a maximal Fourier inequality of Bourgain [4], 
and an interval selection lemma of Lacey [13]. 



10 



CIPRIAN DEMETER, TERENCE TAO, AND CHRISTOPH THIELE 



Example 1.5. Consider the m = 2 average along squares from Section 1.4- Here the 
extended matrix is 

/0 1 1\ 

m/ ^ 10 1 

E(A):= x x 1 . 
V° V 

One can check that rank(A) = 2 and rank*(E(A)) = rank(E(A)) = 3 ; and Sa, € consists of 
those triples (a, b, c) with a,b,c G {0, 1/2 + e, 1 — e} ; at most one o/a, b, c egwa/ to 1/2 + e, 
and at most two of a, b, c equal or 1 — e. 27ms awes 

H A = {(a,b,c):0<a,b,c<l;a + b< 5/2}. 

Combining the above example with Proposition 14.1 from Appendix 14, and the result 
of Assani [1], we obtain the following corollary. 

Corollary 1.6. Let 1 < Pi,P2,P3 < oo be such that ^" + ^ + ^ < f ■ ^or every dynamical 
system X = (X, £,m, 5), t/ie averages on squares 

N N 
i=-N j=-N 

converge a.e. x, for each f\ e L Pi (X). 

Remark 1.8. A version of Corollary 1.6 holds for all averages with k — 1. The conver- 
gence for L°° functions follows by using the aforementioned fact that these averages have 
characteristic factor Z x for pointwise convergence. We omit the details. 

Remark 1.9. In [7] we use combinatorial methods involving sum set estimates to get 
nontrivial positive results in Corollary 1.6. This completely different approach gives the 
result only in a small range, p' 4 > ^ for some unspecified e, and does not seem to extend 
to the case when p' 4 is smaller then or even close to |. 

Remark 1.10. An interesting contrast to the results of Theorem 1.1 is provided by the 
constructions from [8], showing that some maximal operators fail to be bounded when the 
indices pi, 1 < i < n — 1 are sufficiently close to 1. As a consequence, both Furstenberg's 
averages with n > 4 and the averages on cubes with m > 3 are proved to diverge a.e. 
in some range of L p spaces. The trilinear maximal operator from (9) has been proved 
in [6] to be unbounded for pi = p 2 = p% — p, 1 < P < §, for appropriate choices of a 
depending on p. The main ingredient behind these negative results is the fact that the 
polynomials x + YlJ=i a i,jtji 1 — * — n ~ 1 are linearly dependent in R[x, t±, . . . ,t{\ and 
hence rank*(E(A)) < n — 2 and k > 2. In other words, our tools provide negative results 
only when k > 2, and all positive results are trivially deduced from positive results when 
k = 0,1. Further progress would require to break this barrier in the complexity k either 
for positive or for negative results. 

The following is the straight-forward application of Corollary 1.2 to averages on cubes. 
In this case, while rank(A) is the dimension of the cube, we have rank* (A) = 2, an 
obstruction for higher nondegeneracy rank being the linear dependence of the polynomials 
ti, t 2) and ti +t 2 . On the other hand, rank*(E(A)) = 3, and hence: 
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Corollary 1.7. Let 1 < p € < oo,e G V m; fre such that 




For every dynamical system X = (X, E,m, S), the averages on m- dimensional cubes (5) 
converge a.e. for each f e e L Pe (X). 

This of course generalizes Corollary 1.6. It would be interesting to know whether one 
can improve over 5/2 on the right-hand-side of (10). Certainly the methods of this paper 
do not yield such an improvement, and [8] provides an upper bound of 28/5 for the 
right-hand-side of (10) for three dimensional cubes. 

Theorem 1.1 is proven using standard time-frequency strategies, and in particular fol- 
lows the approach of Lacey [13], though it is more self-contained and employs some 
technical simplifications over that in [13]. In Section 2 we use the theory of multilinear 
interpolation to reduce Theorem 1.1 to a model case, Theorem 2.3, in which the matrix 
A is in a simplified normal form, the functions f\, . . . , /„_i have become L 2 -normalized 
functions adapted to certain sets Ei, . . . , E n -i, and the output is being measured in an- 
other set E' n which excludes a certain exceptional set determined by the Hardy-Littlewood 
maximal function. In Sections 3, 4 we use the Fourier transform and wave packet decom- 
position to reduce matters to bounding a certain model sum (Theorem 4.4) involving the 
inner product of the functions fi, ■ ■ ■ , f n with various wave packets (and maximal wave 
packets) associated to a certain "rank one" collection of multitiles. To estimate this model 
sum, we organize the collection of multitiles into trees; after obtaining an upper bound for 
the contribution of a single tree (see Proposition 6.2 and Section 7) one quickly reduces 
(essentially by summing a geometric series; see Section 6) to that of proving estimates 
for a tree selection algorithm (Lemma 6.3), which in turn reduces to a certain maximal 
Bessel inequality concerning wave packets in a forest (Theorem 9.1, slightly improving 
and simplifying a similar result from [13]). This Bessel inequality will involve a certain 
logarithmic-type loss involving the size parameter 2 m , but by some "good- A" type reduc- 
tions in Section 9 we can replace this factor with another logarithmic factor involving 
instead the multiplicity ||A^|| L oo of the forest (Theorem 9.2). After some sparsification 
of the tile set, some elimination of exceptional tiles, and duality, one reduces to estab- 
lishing a certain maximal Bessel inequality on two families of tiles (see (60) and (61)). 
These inequalities are proven by using the time localization properties of wave packets, 
a non-maximal Bessel inequality (proven in Section 13), and the Radamacher-Menshov 
inequality. In the case of one of these inequalities (61), one also needs a maximal in- 
equality of Bourgain [4]. Finally, in an Appendix (Section 14) we present a standard 
correspondence principle equating boundedness of maximal functions on R with maximal 
functions on measure-preserving systems. 

2. Interpolation reductions 

The rest of the paper is devoted to the proof of Theorem 1.1. We shall use the methods 
of multilinear time-frequency analysis and work entirely on R, thus we will not make any 
further reference to the dynamical system X. 

In this section we use some multilinear interpolation techniques to reduce the operator 
^~a,r and the exponents pi, ■ ■ ■ ,p n to a standard form, and then to also reduce the input 
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functions fi, ■ ■ ■ , f n -i (and an additional output function f n arising from duality) to 
another standard form. 

We first introduce some basic notation. If E is a measurable subset of R, we use 1e 
to denote the indicator function of E and \E\ to denote the Lebesgue measure. Also 
Mf(x) := sup r>0 27 f*^ \ f\{y)dy denotes the classical Hardy-Littlewood maximal func- 
tion. The notation a < b or a = 0(b) means that a < cb for some universal constant 
C (which will be allowed to depend on parameters such as n and p±, . . . ,p n ), and a ~ b 
means that a < 6 and b < a. In some cases we will subscript the < notation by a param- 
eter to emphasize the fact that the constant C involved can depend on that parameter, 
thus for instance a < M b means that C can depend on ji. If x G R n we use ||a;|| to denote 
the Euclidean norm of x. 

Now we can reduce the operator X^ R and the exponents pi, ■ ■ ■ ,p„ to a standard form. 

Theorem 2.1 (First reduction). Let n > 3, let I] be a hyperplane in R n_1 containing the 
origin but not containing any of the n — 1 coordinate vectors e\, . . . , e n _i or the vector 
(1, . . . ,1). Then the (n — \)-linear operator T* defined by 

T*(f 1} ... J n _ 1 )(x) = 

SU P -^=2 l ^ + *!)••••• /n-lfc + t n -!)\dt 

r>0 T JrgE:||t||<r 

is bounded from L Pl (R) x ... x ZA-^R) to L p ' n (H) whenever 1 < pi, ■ ■ ■ ,p n -i < 2, 



| — n < — <3 — n, and 

2 Pn ' 



1 1 
— = — + ...+ 



Pn Pi Pn-1 

The bound of course depends on pi, . . . ,p n and the A«. 

Remark 2.1. Note that rank(E(A)) = rank*(E(A)) = n — 1. Hence we are in the case 
k — 1 of Corollary 1.2 and the corollary is equivalent to Theorem 1.1 in this case. The con- 
dition that E does not contain e±, . . . , e n _i or (1, . . . ,1) corresponds to the nondegeneracy 
condition in [16]. 

Proof [of Theorem 1.1 assuming Theorem 2.1] By multilinear interpolation as in [16] it 
suffices to prove the estimate for tuples (l/pi) in Sa,€ for some < e < 1/2, so in particular 
1/pi = {1/2 + e, 1 — e, 0} for all i. We may of course assume the fi are non-negative. For 
each index i with pi = oo we can trivially estimate fi by its supremum norm and remove 
it from the maximal operator: 

j « n— 1 TCI 

sup n / n ^ x + yi £ 

r>0 [2r) J\t 1 \,...,\t m \<rfJl ^ 

If m 

< \\fj ii loo sup — — / fi fi( x +Yl ai ^j) 

r>0(2r) J\ tl \,...,\t m \<r+£ 

Doing this to each such exponent, we may assume without loss of generality that 1/pi G 
{1/2 + e, 1 - e} for all %. 

If 1/pi — 1 — e for all i, then by definition of Sa,€ the rows of the matrix A are linearly 
independent and we may do a change of variables so that a it j is the Kronecker delta for 
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1 < i, j < n — 1. Of course the cube of integration in the parameter space {(ti, . . . ,t n )} 
will be a parallelepiped in the new variables, but we may use the positivity of the f\ and 
estimate the characteristic function of the parallelepiped by that of a cube, conceding a 
bounded loss in the estimates. We may also assume that A is a square matrix of dimension 
m — n — 1, since in the case m > n — 1 we may fix the variables tj with j > n — 1 and 
apply the result in the square matrix case to fixed translates of the function f\ obtaining 
an L P (R) bound independently of the translation. Then we perform a dummy average in 
the variable tj with j > n — 1 to obtain the desired estimate. In the square matrix case 



we estimate 



1 f n_1 

r>0 (AJ J\ tl \,...,\t n -i\<r 



'\ti\,...,\t n -i\<r i=l 
n-1 1 . 

< TTsup- / \fi( x + t)\dt 

t =1 ^>o e J\t\<e 

and then apply the Hardy Littlewood maximal theorem for L l+e and Holder's inequality 
to obtain the desired estimate. 

It remains to consider the case when 1/pj = 1/2 + e for one index j and 1/pi = 1 — e 
for i ^ j; note that this places these exponents in the situation of Theorem 2.1. We 
may assume that n > 3 since the n = 2 case follows from the Hardy-Littlewood maximal 
inequality. By symmetry we may assume that j = n — 1. The first n — 2 rows of A are 
linearly independent and we may assume that {ai,j)i<i,j,<n-2 is the Kronecker delta. We 
may assume that the last row of A is a linear combination of the other rows, or otherwise 
we can apply the reasoning of the previous paragraph. By a reasoning as in the previous 
paragraph we may also assume that m < n — 2. Thus after a change of variables if 
necessary (and covering the resulting parallelepiped by a ball) the operator T^ R takes 
the form (11) for some hyperplane E. If S contains e*, then we perform the ti average 
first, estimate the average using the Hardy Littlewood maximal function of f iy and use 
Holder's inequality to reduce matters to the case with one function less. We may thus 
assume that S does not contain any of the e^. Finally, the hypothesis that the first n — 1 
rows of E(A) are linearly independent implies that E does not contain (1, . . . ,1), and the 
claim now follows from Theorem 2.1. ■ 



Remark 2.2. If the hypothesis § — n<^-<3 — nis replaced by 1 < p n < oo then 
Theorem 2.1 is easy to prove. Indeed, in this case we can use Holder's inequality to 
obtain the pointwise estimate 

n— 1 

T*(fi,-- - , f n -i)(x) < (HM\fr /p ") p ' Jp '(x), 

i=l 

at which point the claim follows from the Hardy-Littlewood maximal inequality. 

To prove Theorem 2.1, it suffices to prove the following "restricted weak-type" analogue. 
For any measurable ficR, let X(E) denote the space of functions supported on E which 
are bounded in magnitude by 1. 
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Theorem 2.2 (Second reduction). Let n > 3, and let E and T* be as in Theorem 2.1. 
Let Ei, . . . , E n be subsets of R of positive finite measure. Let p 1 , . . . ,p n be such that 
1 < pi, . . . ,p n -i < 2, | — n < l/p n < 3 — n, and 

11 1 
— = - + ... + . 

Pn Pi Pn-1 

Then there exists a subset E' n of E n with \E' n \ > \\E n \ such that one has 

| jT*(g 1 ,...,g n _ 1 )g n \<\E 1 \ 1 '»...\E n \ 1 '*» 

for all gi G X{Ei),... ,g n ~\ G X(E n _i),g n G X(E' n ). Here the implied constant is 
allowed to depend on n,pi, . . . ,p n and E. 

In the notation of [16], Theorem 2.2 asserts that the n-sublinear form J T*(fi, . . . , f n -i)f\ 
is of restricted type (1/pi, • • • , l/p«) with n as the bad index. The deduction of Theorem 
2.1 from Theorem 2.2 follows from a variant of the Marcinkiewicz interpolation theorem 
and is a minor modification of the argument in [16, Lemma 3.11]; the details will be omit- 
ted here. The point of Theorem 2.2 is that the functions gi, ■ ■ ■ ,g n have been normalized, 
indeed gj can be thought of as essentially the indicator function of Ej (or E' n when j = n). 

By a limiting argument we may take E ± , . . . ,E n to be finite unions of intervals, and 
gi, . . . , g n to be smooth; this allows us to justify a number of formal computations in the 
sequel without difficulty, and we shall do so without any further comment. 

To prove Theorem 2.2, we may apply a rescaling argument to normalize \E n \ = 1. From 
the Hardy-Littlewood maximal inequality we may then set 

K ■= E n\V (12) 

where Q is the exceptional set 

n 

Sl:=\J{Ml Ei >C\E i \} (13) 

i=i 

for a sufficiently large absolute constant C, so that \E' n \ ~ 1. It is convenient to renor- 
malize a, := 1/pi — 1/2 and fi := gi/\Ei\ 1 / 2 , thus fi lives in the L 2 -normalized space 
X 2 (Ei) of functions supported on Ei and bounded in magnitude by l/l-E^ 1 / 2 . We also set 
a n := — oil — ... — thus < a n < 1/2. Theorem 2.2 now reduces to 

Theorem 2.3 (Third reduction). Let n > 3, and let S and T* be as in Theorem 2.1. Let 
Ei, ... ,E n be finite unions of intervals with \E n \ = 1, and let E' n be defined by (12), (13), 
so that \E' n \ ~ 1. Then one has 

i J r*(fi, f n _i)f n \ < . . . iK-ii"- 1 

for all smooth f\ G X 2 (E 1 ), . . . , / n _i G X 2 (E n - 1 ), f n G X 2 {E' n ) and any < an, . . . , a n < 
1/2 with ati + . . . + a n — The implied constant can depend on n, aii, . . . , a n , S. 

This reduction is slightly more convenient to work in as the L 2 normalization of 
fi, . . . , f n will be useful for a certain "(maximal) Bessel inequality" which is crucial to a 
later stage of the argument. 
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3. Fourier representation 

Our task is now to prove Theorem 2.3. As in [13], we begin by replacing the rather rough 
truncation in (11) by a smoother one which has a more tractable Fourier representation. 
As is customary, for any / G L 1 (R), we define the Fourier transform 

/(0 := / < ^~V(.r) dx 
Jn 

and the inverse Fourier transform 

/!.<•) := / < 2: ~ r \IU) <: 

JR. 

we use a/^T here instead of i in order to free up the letter % for use as an integer- valued 
index. 

Let us fix the hyperplane E. We view the hyperplane E as an n — 2-dimensional Eu- 
clidean space with Lebesgue measure dt, and thus endowed with its own Fourier transform; 
thus if 9 is a Schwartz function on E we have the inverse Fourier transform 

e{t) ■■= j/^Mi) di- 

We now introduce the multilinear operator 

„ n— 1 

T$(fi, • • • , fn-i)(x) := / (II f& + ti))e(t)dt; 

this operator can also be written in Fourier space as 

„ n— 1 

w, . . . , /„_!) = c s / (ji mm^y^^-^di 

where 7r : R n_1 — > E is the orthogonal projection onto E and > is a normalization 
constant depending only on E. For any integer k, write := #(2 fc £). We define the 

associated maximal function T$ as 

Te(fi,... ,fn-i)(x) := sup (/!,... ,/ n _i)(x)| 

fcez 

We shall deduce Theorem 2.3 from 

Theorem 3.1 (Fourth reduction). Let n > 3 ; and let E fre as in Theorem 2.1. < 
, a„ < 1/2 with «! + ... + «„ = ^^y^. Let 6 be a smooth function supported on a 
ball {( 6 E : ||£|| < 4} which is constant on a ball {£ G E : ||£|| < 1/4}, and ofreys t/ie 
estimate 

W)\ < il + ^ 4)N s for allteH (14) 

/or some large integer N depending on ai, . . . , a n . Let E ± , . . . , E n be finite unions of 
intervals with \E n \ = 1, and let E' n be defined by (12), (13). Then one has 

i J T*(f u /„_!)/„! < i^r . . . iK-ir- 1 
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for all smooth fi G X 2 (E 1 ),... , / n _i G X 2 (E n _ 1 ), f n G X 2 (E' n ). The implied constant 
can depend on E, a±, . . . , a„_i, A" and on the implicit constant in (14). 

Proof [of Theorem 2.3 assuming Theorem 3.1] We may take f 1: . . . , / n _i non-negative. 

Let f] be a fixed real-valued symmetric Schwartz function 5 on £ supported on the ball 
{||£|| < 1} whose Fourier transform is non-negative and r/(0) = 1. Observe that 

i r n_1 

T % (A, . . . , f n ^)(x) := / (II f& + ^))v(t/2 k )dt. 

JS 3=1 

From this, the positivity of the fi and f/ it is easy to establish the pointwise estimate 

T*(fl, ■■■ , fn-l)(x) < T*(fx, ... , f n -i)(x) 

(where the implied constant depends on rj) so it suffices to show that 

I J . . . , /„_!)/„! < \E,r ■ ■ ■ iK-il - 1 - 

We cannot yet apply Theorem 3.1, because 7] is not constant near the origin. Indeed the 
requirement that 17 be non-negative forces rj to have a negative Laplacian at the origin. 
Fortunately, we can rectify this by a a further dyadic decomposition. More precisely, we 
split 



l=—oo 

with f] 2 smooth, symmetric, supported in ||£|| < 11/10 and equal to 1 on ||£|| < 1, while 
0/(0 := (V ~~ V2) (0(^2(^/2') — r] 2 (i / 2 1 ' 1 )) . One can easily verify that the function rj 2 is 
already of the form required for Theorem 3.1 and so T* 2 gives an acceptable contribution 
to T*. As for the tail terms (pi, we observe the Fourier estimates 

\—d,,(i.)\ < 2-1*1 - 

\ 2l m 2l )\~ ( 1 + ||£||)7V3 

uniformly in I. Also, <pi is constant on ||£|| < 2 z /4 and zero when ||£|| > 4 x 2 l . A simple 
rescaling argument using Theorem 3.1 (noting that is unchanged if one replaces by 
(f)(2 l •)) then shows that 

1 J t;(a, . . . , / n _o/ n | < ■ ■ ■ iK-ir- 1 . 

The claim now follows from the triangle inequality. ■ 



5 Such a function can be constructed by starting with a real-valued symmetric function on the ball 
{||?|| < 1/2}, then convolving it with itself and normalizing it. 
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4. Discretization 

It remains to prove Theorem 3.1. We now perform the usual dyadic decompositions to 
reduce matters to estimating a certain sum over dyadic objects, namely a collection of 
"multitiles" , after first doing some additional refinements to ensure that these multitiles 
obey some good geometrical properties (specifically, a rank one condition). 

We introduce two large constants 1 <C Co <C C\ (depending on £, and C\ assumed to 
be large compared to Co) that will be used to sparsify the time-frequency geometry. We 
will take some care to specify how the implied constants in the < notation depend on Co 
and C\\ however we will allow these constants to depend freely on n, a±, . . . , a n , N, S. 

It will be convenient to dilate 9 by C , so that 9 is now supported on {£ e E : ||£|| < 4C } 
which is constant on a ball {£ G £ : ||£|| < C /4}; this affects our final bounds by some 
factor depending on C , but as we shall eventually choose C to be a quantity depending 
only on existing parameters such as n, a±, . . . , a n , N, S, this shall be of no consequence. 
We perform the dyadic decomposition 



0*(0 = Z>(0 



i>k 



where </?(£) := 9(£) — 9(2£) is a smooth function supported an annulus ||£|| ~ C , and 
<Pi<£) : y?(2*0- Thus 



i>k 



and hence for any f\, . . . , /„ we have 

J IJ(/l.-./»-l)A = E / T Vt (fl,-- - ,fn-l)(x)fn(x)l 



i>k(x) dx 



for some integer- valued measurable function k : R — > Z. Thus it suffices to establish the 
multilinearized estimate 

I J] / ^(/x,... Jn-l)^)/^^^) tte| < Co , Cl I^^ClK-ll"' 1 - 1 (15) 

for each such function : R — > Z, which we now fix. Note that we can write the left-hand 
side as 

/ [ (flMx + t^^dtdx (16) 
JrJt: j=l 

where fj ti := fj for 1 < j < n — 1 and f n ,i(x) := f n ( x )h>k{x), an d we adopt the convention 
that t n = 0. One should think of i as a scale parameter, corresponding to the terms with 
frequency uncertainty ~ 2~ l and time uncertainty ~ T '. Note that the annulus that ifii 
is supported in has thickness ~ Co2~ l and can thus tolerate the frequency uncertainty 
associated to the scale %. 

The next (standard) step is wave packet decomposition. We shall adopt the usual 
trick of covering the time domain R by three overlapping dyadic grids to eliminate some 
artificial boundary effects caused by dyadicity. 
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For each 1 < j < n, let us pick a Schwartz function ipj such that ipj is supported in 
[0.1,0.9], and that ipj is rapidly decreasing; in particular we have the bounds 

1^-0*01 £ (! + \AY WN for all x e R (17) 
and we have the following property for every (6R: 



E 

lez 



r 2 



= i. 



This is possible because the translates of [0.1, 0.9] by integer multiples of | cover the real 
line R with some room to spare for smooth cutoffs. For each scale i e Z we can then 
decompose 

fi,j = E (f i <i'^ J j,i,ml)i J j,i,mli 
rn,leZ 

where 

^mAx) := 2-ty j (2~ i x - m)e 2 ^ 12 " xl 

and (/, g) := / fg is the usual inner product. Inserting this decomposition into (15), (16) 
and using the triangle inequality, we reduce to showing that 

n 

E E ^m 2 ^ 1 "^ } II K/^- ^^,,^>i <o , Cl i^r 1 . . . i^-ir- 1 (is) 

— * 

where fh = (m 1; . . . , m n ), I — . . . , l n ), and C7- ^ are the operator coefficients 

C^i == ^5)1 / ffli'i,i, mi ^ + ti%&^\- (19) 

One should think of m as containing the time location, and / as containing the frequency 
location information; roughly speaking, the summand in (18) is the contribution when fj 
is localized in space to 2 t m j + 0(2 % ) and localized in frequency to 2~Hj + 0(2~ % ). 

We now use the geometry of the hyperplane £ to obtain localization estimates on the 
coefficients f v We let Y C R n denote the hyperplane r := {(£i, . . . , £ n ) '■ + - • -+£n = 
0}. 

Lemma 4.1. VKe /jai>e £/ie estimate 

CrnXi (1 + diamlmx, . . . .mj)"* 3 . (20) 

Furthermore, ifC^^ is non-zero, then 

h + ... + l n = 0(l) (21) 

and 

||7r(/ 1 ,...,^ 1 .)||~.C . (22) 

Remark 4.1. In the notation of [16], these conditions are essentially asserting that the 
tuples (m,l,i) with a sizeable coefficient C fh ^ i form a collection of multitiles of rank one 
(which is also the situation with the bilinear Hilbert transform). See also Definition 4.3 
below. 
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Proof We first observe that by rescaling by 2 % that C A f i is actually independent of i. 
Thus we may assume % — throughout the proof. 

To prove (20), we then use the physical space representation (19) of C^f , followed by 
the triangle inequality, to obtain 



n - < 




'Jl^jix + tj-mjWotf) 

3=1 

Now as ip is rapidly decreasing, we conclude from (14) that 



R </£ 



dtdx. 



u m,!,0 ~C ,Ci 




[±+\\t\ 



\-2Af 2 



R JT, 



\\{l + \x + tj-mj\) 



-27V 2 



dtdx 



j'=i 



.N 2 



(say), and the claim (20) follows from the pointwise estimate 

n 

JJ(1 + |x + tj - mj\)- N2 < (1 + diam{mi, . . . ,m n }y N \l + 
j'=i 

Now suppose that ^ is non-zero. To exploit this we use the Fourier representation, 
converting (19) to 



a - 



JI^-'iM^i.-- - del. 



Thus there exists ( 6 T such that — lies in the support of ^ for all 1 < j < n 
and 7r(^i, . . . , £ n -i) lies in the support of 9. From the former property we have lj = 
£j + 0(1), and (21) follows from the definition of T. From the latter property we have 
||7r(£i, . . . , £ n -i)|| ~ C , and the claim follows by using the approximation lj = ^ + 0(1) 
and the homogeneity of n. ■ 

In view of the above lemma, it now suffices to show that 

n 

E ^ + diam{m!, . . . , m n })- WOn2 2^ f[ |</^, <cbA I^P • • • iK-il "" 1 

(m,M)en i =1 (23) 

where f2 is a collection of triples (m,l,i) 6 Z n x Z™ x Z obeying (21) and (22). 

We now perform a number of refinements to improve the nesting properties of the set 

— * 

f2. First we observe that for each (m,l,i) G f2 and 1 < j < n, the Fourier transform of 
i)j,i,mj,ij is contained in the interval [2~ l j, 2 -i (^ + 1)] (in fact they are contained in the 

slightly smaller interval [2 _l (^ + 0.1),2~*(^ + 0.9)]). These intervals are almost dyadic, 
but for the denominator of 3. However this factor of 3 can be eliminated in the following 
standard manner. Let Vq,T>i,T>2 be the dyadic grids 



V 
V 2 



= {[2-*Z, 2-^ + 1)] :i,l E Z} 

= {[2-\l + (-1)73), 2-\l + 1 + (-1)73)] :t,leZ} 
= {[2-\l - (-1)73), 2~\l + 1 - (-1)73)] :i,le Z}. 



(24) 
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Thus T>q is the standard dyadic grid, and the other two grids are essentially similar 
(one can view the latter two grids as translates of the first by the non-terminating 2- 
adic ±1/3). In particular, within a single grid we have the nesting property that if two 
intervals intersect, then the shorter one is contained by the longer one. Observe that 
every interval [2~' l j, 2~ l ( l j + 1)] belongs to one of these three grids. By pigeonholing once 
for each j (conceding a factor of 3 n in the estimates), we can assume that for fixed j, the 
intervals [2 _i £, 2~ i ( l j + 1)] belong to a single dyadic grid. For ease of exposition we shall 
assume that these intervals always lie in the standard dyadic grid T> , thus the intervals 
[2~ % j, 2~ t (j + 1)] are genuine dyadic intervals. The other cases are handled similarly but 
with some minor changes in notation. 

Morally speaking, the localizing factor (1 + diam{mi, . . . , m n })~ N2 in (23) implies that 
the diagonal contribution m\ — . . . — m n is the dominant contribution. Again to simplify 
the exposition, we shall focus entirely on this diagonal case m\ = . . . = m n . We now 
briefly sketch how to pass from the diagonal case to the general case. Write rrij = mi + rj. 
For each fixed n — 1-tuple of integers r 2 , . . . , r n , one can convert the case rrij = mi + rj 
to the diagonal case mj = mi by shifting the function ipj by rj. This affects the bounds 
(17) but only by (1 + Irjl) 10 ^ at worst. This gives a total loss of U^ii 1 + \rj\) WnN for 
this contribution, but one is also gaining a factor of (1 + diam(0, r 2 , . . . , r n ))~ N , and the 
product is then summable in r if N is large enough. Thus it suffices to treat the diagonal 
case. 

Another application of the pigeonhole principle (giving up a constant factor of C\ in 
the estimates) allows one to refine the scale parameter % to not take values in the integers, 
but to instead take values in a residue class {% = c mod C{\ for some residue c. This 
"sparsification" of the scales will be useful in obtaining a certain rank separation condition 
in the frequencies below. 

Finally, we analyze the conditions (21) and (22). Observe that if we instead had the 
exact constraints l\ + . . . + l n = and n(h, . . . , l n -i) = 0, then (Zi, ... ,l n ) would be 
restricted to a one-dimensional subspace of R n . Since S did not contain ei, . . . , e n _i or 
(1, . . . ,1), it is easy to see that the non-zero vectors in this one-dimensional subspace 
have no zero coordinates; thus we have lj = c^yly for all 1 < j,j' < n and some explicit 
non-zero finite constants Cjj> depending only on S; furthermore we have Cjj = 1, Ciy + 
• • • + c n ji = and n(cijf, . . . , c n _ij/) = 0. Returning now to the inexact constraints (21), 
(22), we conclude that 

l i = c hi'h' + O(C ) 

for all 1 < j,j' < n. By pigeonholing (and conceding a factor of Cf at worst) we may 
thus assume that 

l i = l c j,j' l f \ + H? ( 25 ) 

on fl for all 1 < j,j' < n and some fixed integers djji = 0(Cq); note that djj is necessarily 
zero. Thus each frequency lj is now uniquely determined by any of the other frequencies 
Iji. Furthermore, from (21), (22) we have 

ai d > + . . • + a n:j > = 0(1) and ||7r(aij/, . . . ,a n -ij>)\\ ~ Co- 
if C is large enough, this implies the following basic fact: 

Lemma 4.2. For each j' , there exist at least two j distinct from j' such that |ajj/| ~ C . 
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The upshot of this lemma is that whenever we fix one of the frequencies of /i, . . . , f n , 
at least two other frequencies depend in a "lacunary" manner on the scale parameter %. 
This fact will be crucial in controlling the geometry of certain "trees" which will appear 
later. 

The estimate (23) has now been reduced to 

n 

E 2i(1_t) nK^^^)i s ^r^K-ir- 1 (26) 

(m,T,i)en i =1 

We now convert (26) into the more traditional language of multitiles and wave packets. 

Definition 4.2 (Tiles). A tile P is a rectangle P = Ip x oop with both I P and oop dyadic 
intervals, obeying the Heisenberg relation \Ip \ ■ \oup\ = 1; we refer to Ip as the time interval 
of P and up as the frequency interval. A multitile s is an n-tuple s = (s±, . . . , s n ) of tiles 
with the same time interval I s :— I Sl — . . . — I Sn . If / is an interval and C > is a 
number, we let CI denote the interval with the same center as I but C times the length 
(note that this interval will most likely not be dyadic). Let us say that a function ipp is 
a wave packet adapted to a tile P if ip P is supported in 0.8u> P and we have the pointwise 
estimate 

\Mx)\ £ \Ip\- 1/2 xT{x) for all x G R (27) 
where for any interval J, xi is the weight function 

and c(J) is the center of J; in particular observe that ipp is normalized to have an L 2 norm 
of 0(1). 

— * 

Note that because of all the reductions we have already achieved, every triple (m,l,i) 
in Vl gives rise to a multitile s with sj := [2 l mj, 2*(mj + 1)] x lo s . := [2™*^-, 2~*( l f + 1)]. In 
particular we have \I S \ = I 1 . Let S max denote the collection of all multitiles obtained this 
way. For each multitile s G S max arising from a triple (m,l,i), define the functions ip s j 
for 1 < j < n by setting 

ip s ,j(x) ■= ij>j,i,m j ,i j (x). 

Observe that for each 1 < j < n, ip s j is a wave packet adapted to Sj. We also observe 
the following important consequence of Lemma 4.2. 

Definition 4.3 (Rank one). A collection S of multitiles is said to have rank one if for every 
j G {1, . . . ,n} there exists distinct ji(j),j 2 (j) e {1, . . . ,n}\{j} and signs e 1 (j),e 2 (j) G 
{ — 1,+1} (not necessarily distinct) with the following properties. 

• (Scale separation) If s, s' G S are such that \u> Sj \ > \uj s '.\, then \u> Sj \ > 2 Cl \u s 'j\. 

• (One independent frequency parameter) If s,s' E S are such that uj s , = u s >., then 

ov = w,' for all 1 < j < n. 

Sj s., 

• (Nearby j-frequencies implies nearby j'-frequencies) If s, s' G S are such that 
10cj S j fl lOcUs'j 7^ and |7 S | > |7 S /|, then dist(o; SJ /, u> s ',j') ^ Colls'l^ 1 for all 1 < 
f < n. 
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• (Lacunarity property) If s, s' G S are such that !0u Sj D 10co s ',j ® an d \I S \ > \I S '\, 
then dist(u> Si j t ,u s ij t ) ~ Co|is'| -1 for t — 1,2. In particular lOu Sn and 10u s >^ 
are disjoint. Furthermore we require e t (j)(£' — > for all £ G 10o; SJt and 
£' G lOavjt- 

Remark 4.4. For the definition of higher order rank (which we will not need here), see 
[16]. Actually our definition of rank one is slightly stronger than that in [16] in that we 
require the indices ji, j 2 , to depend only on j, and not be dependent on s,s', 

but this is only a minor technical change. 

Lemma 4.3 (Rank one property). S max has rank one. 

This lemma shows, among other things, that the multitiles in S max have essentially 
one independent frequency parameter. Note that if S max has rank one, then so does any 
subset S of S max . 

Proof The scale separation property follows since for each multifile s G S max , we have 
\w Sj \ = \Is\~ 1 = 2~* for all 1 < j < n and some integer % = C mod C\. The remaining 
properties follow from (25) and Lemma 4.2, setting j ± , j 2 to be the indices distinct from 
j such that la^l, \o>j 2 ,j\ ~ Co, and e t (j) to be the sign of aj t j. m 

We also define the modified wave packets (f> s j by setting 

(t> s ,j := iJ sJ for 1 < j < n - 1 (28) 

and 

c/) s , n (x) := ^ s> „(a:)l|j s | >2 fc(«). (29) 
The estimate (26) can now be rewritten as 

n 

E i^i (1_t) ni(/^^>i^i^r---i^-ir n - 1 - 

5 ^S ma x j = 1 

By the monotone convergence theorem we can replace S max by a finite subset S of S max , 
so long as our estimates are uniform in S. Note that the properties in Lemma 4.3 will be 
preserved if we pass from S max to S. We can now deduce (26) (and hence Theorem 1.1) 
from the following more abstract result. 

Theorem 4.4 (Fifth reduction). Let n > 3, let < a±, . . . , a n < 1/2 with a± + . . . + a n — 
and let N be a sufficiently large integer depending on a±, . . . ,a n . and let S be a finite 
collection of multitiles which is rank one. For each s G S and 1 < j < n, let ip s j be a 
wave packet adapted to Sj . Let k : R — > Z be an arbitrary measurable function, and let 
<f> s j be defined by (28), (29). Let E±, ... ,E n be finite unions of intervals with \E n \ = 1, 
and let E' n be defined by (12), (13). Then one has 

n 

Ei / 'i (i " 5) ni</i'^>i^i^i ai ---i^i a "- 

ses j=i 

for all smooth fi G X 2 (E 1 ),... , G X 2 {E n _i), f n G X 2 (E' n ). The implied constant 
can depend on a±, . . . , a„_i, iV and on the bounds in the rank one condition and (27) but 
is uniform in S. 
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Remark 4.5. If the 4> s j were replaced by ip s j (i.e. if the cutoff |/ s | > 2 k ^ were not 
present) then this result would follow from the results in [16]. Thus the novelty (which 
is also present in [13]) is the cutoff \I S \ > 2 k ( x \ which ultimately arises from the maximal 
function nature of T\ R . 

5. Trees 

It remains to prove Theorem 4.4. To do this we use the standard strategy of organizing 
the multitiles into trees, estimating the contribution of each tree separately, controlling 
the total number of trees of a certain "size", and then summing up. 

Henceforth we fix the tile collection S and the functions fi, ■ ■ ■ , f n and sets E 1 , . . . , E n , 
as well as the exponents cci, . . . , a n and N, the function k(x) and the wave packet functions 
■ipsj (which of course determine (fi s j)- We now recall a standard notion of tile order. 

Definition 5.1 (Tile order). For any two tiles P and P' , we write P < P' if Ip C I P , and 
3uj p D 3u p/ , and P < P' if P < P' or P = P'. 

Note that this is a partial order on tiles. The factor of 3 is convenient for technical 
reasons to provide a little more frequency separation; the presence of the large constants 
Co and C\ in the rank condition will allow us to have this additional factor. 

Definition 5.2 (Trees). A multitile tree, or tree for short, is a triplet (T, T, i) where 1 < i < 
n is the index of the tree, T G S is a multitile, and T C S is a collection of multitiles such 
that Si < Ti for all s G T. We shall often abuse notation and abbreviate a tree (T, T, i) as 
T. We refer to It := It a s the time interval of the tree. If 1 < j < n and e G {— 1, +1}, 
we say that a tree (T, T, i) is (j, e)- separated if j = j t {i) and e = e t (i) for some t G {1, 2}. 
We say that a tree is j -separated if it is (j, e)-separated for some e G {— 1, +1}. 

Example 5.1. For any tile T and 1 < i < n, the singleton tree ({T},T,i) is a multitile 
tree. 

Remark 5.3. We use the rather clumsy terminology multitile tree to distinguish from 
the notion of a lacunary tree, which consists of tiles rather than multitiles, that we will 
introduce in Section 9. Note that we do not require that the tree T contains its top T, 
although this is often the case; also note that if (T, T, i) is a tree then so is (T U {T}, T, i) 
and (T\{T}, T, i) (so one can always add or remove the top from a tree). This additional 
flexibility in our definition of tree (not present in some other literature) is convenient 
because it makes the notion of tree more stable with respect to passage to subsets. In 
particular, if (T, T, i) is a (j, e)-separated tree and T' C T, then (T', T, i) is also a j- 
separated tree. Furthermore, if T' takes the form T' := {s G T : Sj < T/} for some 
multitile T', then (T',T',i) is also a j-separated tree. 

The rank one condition implies certain geometric facts about trees, which we collect 
below for the reader's convenience. 

Lemma 5.2. Let (T,T,i) and (T',T',i) be (j, e)- separated multitile trees. 

(i) The frequency intervals of a multitile in T are determined entirely by the size of 
the spatial interval. In other words, if s, s' G T and \I S \ = \I S '\, then u Sk = u s > k for 
all 1 < k < n. 
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(ii) Each multitile in T has a distinct time interval: if s,s' G T and s ^ s' , then 

(iii) If s G T and s ^ T, then dist(10a; s ., lOu^ ) ~ Co I -^s I ~ 1 / in particular, 10u Sj and 
IOujtj (ire disjoint. 

(iv) Suppose that s G T and s' G T' are such that u s . C uj s i and I s i D It 7^ 0- TTien 
Sj- < Tj ; and furthermore we have e(£ — £') > whenever £ G a;^ and £' G Wjv. 

Proof If \I S \ = |7 S /| then \u> Sj \ = \co s ' \; since these intervals intersect, we must have u s . = 
uj s i, and then (i) follows from the rank one condition. Property (ii) follows immediately 
from (i). Now we show (iii). From (ii) we see that I s is strictly smaller than J T , and 
so uj s . strictly contains lot v The claim then follows from lacunarity property of the rank 
condition. Finally, we show (iv). We have \lo s \ < \ou s '.\ and \I S \ < \It\ and hence 
\I s i\ < \It\- By dyadic nesting this means that I s > C I T , and to show that s'j < Tj it will 
suffice to show that 3u s > intersects 3w Tj . But u T . lies within < C |c<j Sj | _1 of u Sj , which is 
contained inside cu s '.. Since \cu s >.\ > 2 Cl \u Sj \ by scale separation, the claim s'j < Tj follows if 
Ci is sufficiently large depending on C . To show the remaining claim in (iv), we observe 
from the rank separation condition that dist (cu s >. , ujt'. ) ~ Co|u^.|, with w^j lying below lo s >. 
if e = +1 and above if e = — 1. The claim follows. ■ 

We can now introduce the concept of size. There will be one size for each of the 
functions fi, . . . ,/„. 

Definition 5.4 (Size). For a set of multitiles S' C S and 1 < j < n define its j-size as 

sizej(S') := sup ( -L \(fj, <Ps,j)\ 2 ) 

where the supremum is taken over all the j-separated trees (T,T,i) with T C S'. 

Remark 5.5. In the above definition the trees T are not required to contain their top T. 
However it is easy to see that a tree without a top can be partitioned into trees with tops 
that have disjoint time intervals, and because of this one could replace the supremum in 
the definition of size by a supremum over trees that contain their tops without affecting 
the size. However we will not need to do this in this paper. 

6. High-level overview of proof 

Following the usual time-frequency approach, we can now reduce the task of proving 
Theorem 4.4 to that of verifying a number of lemmas concerning trees. 
The first lemma is easy to state and prove: 

Lemma 6.1 (Contribution of a single tree). If(T,T,i ) is a tree then 

n n 

^i/ s r-?ni^^)i^i 7 Tiii size ^ T )- 

seT i=l i=l 

Proof By definition of size we have 

(EK/-^)I 2 ) 1/2 <I j tI 1/2 ^(t) 
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when i = ji(io) or i = ^(io)- Also, since a singleton multitile is always a tree, we also 
have 

\(fi,(f>,,i)\ < |/ s | 1/2 size t (T) 
for the other n — 2 values of i. The claim then follows from Holder's inequality. ■ 

In light of this lemma, the task is now to subdivide the collection S into distinct trees 
T for which one has the bound 

n 

\It\ 11 8ize *( T ) £ I^T 1 • • • l^n-ir- 1 - (30) 
T i=l 

This will be accomplished via a number of propositions. First we need a basic upper 
bound on the size of a tree, which we prove in Section 7. 

Proposition 6.2 (Size estimate). Let 1 < j < n, let S' be a collection of multitiles in S 7 
and let 

Vs' '■= {I dyadic : I s C I C I s , for some s, s' G S'} 
be the time convexification of S' . Then 

size,(S') < l^f/ 2 sup ^ / x? ■ 

Note that this bound is consistent with the hypothesis fj G X 2 (Ej) and the intuition 
that the j'-size is something like a BMO average of 

To decompose the collection of multitiles S into trees, we need the following result. 

Lemma 6.3 (Splitting lemma). Let S' be a finite collection of multitiles, 1 < j < n 
and suppose that size^S') < 2 m+1 . Let fi > and suppose that N is sufficiently large 
depending on fi. Then S' can be written as a disjoint union 

S' = ( |J T) U S 2 (31) 
where T is a collection of trees such that 

/IP 1 1/2 X i 

E|/tI<,2- 2 '"(^) , (32) 

while 

size i (S 2 ) < 2 m . (33) 

This lemma is quite difficult and will be proven in Sections 8-13. Assuming the lemma 
for the moment, we may iterate it in the standard way (see e.g. [18]) we conclude 

Corollary 6.4 (Tree selection algorithm). Let S' be a finite collection of multitiles and 
1 < j < n. Let /i > and suppose that N is sufficiently large depending on /i. Then, after 
discarding tiles s of j -size zero (in the sense that (fj,(fi s ,j) —0), there exists a partition 

*- U U T 

m:2 m <sizej(S') TeJ 7 " 1 ^ 
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where for each m, jF mj is a collection of trees such that sizej(T) < 2 m+1 and 

E 1^1 <„ 2— (M-) . (34) 
Now we prove (30). It will suffice for each / > to prove the stronger estimate 

n 

\It\ n size *( T ) £ 2~ l \Ei\ ai • • • \E n \ an - (35) 

T i=l 

under the additional assumption that 

2' < 1 + ^£1^MS < 2^ (36) 

for all tiles s G S, since the original claim (30) then follows by dyadic decomposition of S. 
From (36) and Theorem 6.2 we have 

size^S) < \Ei\h 1 for 1 < i < n (37) 

and 

size n (S) < (38) 

Now use the selection algorithm in Theorem 6.4 for S to get for each % the collections of 
trees T m ' % \ the tiles of i-size zero can be safely discarded (viewing them as singleton trees) 
as they make no contribution to (35). We can then partition 

s = (J s™ 1 ' 1 n . . . n s™-' 1 

mi,... ,m n 

where S" 1 ' 1 := \J Te:Fm ,i T and we implicitly assume that 

2 m < < size,(S). (39) 

By pigeonholing we can restrict to the case when rrij = max(mi, . . . ,m n ) for some fixed 
1 < j < n. We then have the partition 

s= |J (J (Tns mi>1 n...ns m »' 1 ). 

mi,... ,tn„:mj=max(mi,... ,m n ) X£ J 

Note that T n S mi>1 Pi ... n S" 1 "' 1 is a tree with the same top as T, and with j-size at most 
2 m J +1 ; this tree need not contain its top, but this is of no consequence for us. To verify 
(35) it thus suffices to show that 

I^t^™ 1 ...2 m " <2~ l \E 1 \ ai ...\E n \ a ". (40) 

mi,... ,r%mj=max(mi,... ,m„) xe.T 7 '™?''' 

Meanwhile, from (34) we have 



e \h\<^m\ 
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where /i is a large parameter to be chosen later. Also, from (39) we have 

T 11 ... 2 m " < T 1 ! JJsizei(S) 2ai 2 {1 - 2ai)m \ 

From these bounds and summing the geometric series in all the rrii for i ^ j, we have 

J2 E i^i2 mi ...2 m "< M 

mi,... ,m„:mj=max(mi,... ,m„) tg^" 1 ^ 

JJ size^S) 2 ^ J] 2 m ^ ( Y[ 2^ 2a >) m i)2- 2m i 
Since a± + . . . + a n — (n — 2)/2, we can rewrite the right-hand side as 

2 

IJsize^S) 201 ^ 2 ^ 
Summing the geometric series, we can bound this (for jj, sufficiently large) by 

Applying (37), (38) we obtain (40) as desired, if N and \i are chosen sufficiently large. 
This proves Theorem 4.4 and hence Theorem 1.1. 

It remains to prove Theorem 6.2 and Lemma 6.3. This will occupy the remainder of 
the paper. 

7. Single tree size estimate 

In this section we prove Theorem 6.2. This estimate is well known in the case j <n — l, 
when the cutoff |/ s | > 2 k ^ has no effect; see [18, Lemma 6.8]. Thus we shall focus instead 
on the more difficult case j = n. 

Our task is to show that 

EK/-^)I 2 ^I 7 tII^I(sup^ / x f) 2 

seT iev T \l\J Ej 

for each n-separated multifile tree (T,T,i). 

Fix (T,T,i). By frequency translation invariance we may assume that G u>T,n- If fn 
is supported outside 2/t then from the decay of (p s . n we get 

\(fnAs,n)\< (j^)Vr*jf Xl 

for all s G T, which proves the result in this case. Thus we may assume without loss of 
generality that f n is supported on 2/ T . 

Using duality it hence suffices to prove that 

TT7l\ [ fny2 a s<l)s,n\ < SUp t-t / x7 

\It\ 2 J sGT J eP T Ml Je„ 

for all (a s ) seT with ||(a s )||j2( T ) < 1. 
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Fix the a s . We can estimate 

|^a s s ,n(z)| < SUp | ^ a s1ps,n( X )\- 

seT k set 

|/ s |>2 fc 

Since T is n-separated, we see from Lemma 5.2 that the tiles s G T with \I S \ > 2 h have 
a disjoint frequency support from the tiles s G T with \I S \ < 2 k . Indeed we can write 
^2 seT a s ip S)n (x) as a Fourier multiplier applied to the function F := ^2 seT a s ip S)n (x), 

\Is\>2 k 

where the symbol of the multiplier is a cutoff smoothly adapted to an interval of length 
~ Co2 k . From this and standard kernel estimates, we conclude that 

SUp | ^ a s^s,n{ X )\ £ MF > 

|/ s |>2* 

and so it will suffice to show that 

-V / /M(F) < sup Ir [ X f. (41) 

\1t\ 2 J I&P-r \ 1 I JE 

For a dyadic interval J denote by Ji, J2, J3 the three dyadic intervals of the same length 
with J, sitting at the left of J, with J 3 being adjacent to J. Similarly let J5, Jq, J7 be the 
three dyadic intervals of the same length with J, sitting at the right of J, with J 5 being 
adjacent to J. Also define J 4 := J. Let J be the set of all dyadic intervals J with the 
following properties: 

(a) J n 2I T ^ 

(b) $ I G V T : |/| < \ J\ and I C 3J 

(c) Jj G /or some 1 < i < 7. 

We claim that 2J T C Uj e jJ. Indeed, assume by contradiction that there exists some 
x G 2I T \ U JeJ J. Let J (0) C J (1) C J (2) C . . . be the sequence of dyadic intervals of 
consecutive lengths containing x, with |J^| = min/ e p T |/|. Since £ J and since (a) 
and (b) are certainly satisfied for J^°\ it follows that jf^ ^ Vt for each 1 < % < 7. 
Moreover, note that for each 1 < % < 7 there is no I G Vt with I C jf "*. We proceed 
now by induction. Assume that for some j > we proved that for each 1 < i < 7 
we have J-^ ^ "P T and also that there is no / G Vt with / C jf\ Note that this 
implies the same for j + 1. Indeed, since 3J^ +1 ^ C 7J^ and by induction hypothesis, 
it follows that (b) is satisfied for J^ +1 \ Hence £ V T for each 1 < % < 7. We 

verify now the second statement of the induction. Note that if there was an / G Vt with 
/ C than the hypothesis of the induction and the fact that 3J^ +1 ^ C 7J^ would 

imply that i G {1, 2, 6, 7}. Hence I C C It-, and by convexity of Vt it would follow 

that J^ +1 ^ G Vt, impossible. This closes the induction. To see how the claim follows 
from here, observe that It = for some which certainly contradicts the fact that 
It G Vt- 

Next thing we prove is that on each interval 2 J with J G J , the oscillation of F is well 
controlled. More exactly we will show that for each x, y G 2J, \F(x) — F(y)\ < —^r. We 
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have 

\F(x)-F(y)\<\J\J2\sup\^ n (z)\\a s \ 



< 



since by definition there exists no I s C 3J. Now 



t^Z /J 2 



i j i E 7^xy s (c(j))\a s \<\j\(Y: ki¥(EE;4^ 

s£T \ 1 s\ 2 sG T 2 fe >|J| *>1 



Ul 2 



and also 



i j i E ^i^( c ( J ))i«^(E N 2 ) f (E E^) 1 

i/'IT^i s \i s s?<\j\ 2k ^\ J \'i>^ 

< j_ 

~ I Tl I ' 

due to the fact that there exists no I s C 3/ with \I S \ < \J\. 

Define now the measure space X = U j^jJ and its a-algebra T generated by the 
maximal intervals J E J. Recall that 2I T C Ujg^J = X C 10/ T - We will see that for 
each x E J 

M(F)(x) < ^ / M(F)(z)d* + — ^r- (42) 
Ml •/ j | J| 2 



Indeed, if r > ||J|, 



1 / |F|(^<infM(F)(y) 
2r /___ y&J 



< 1 



1^1 

On the other hand, if r < || J|, 



M(F)(z)dz. 



1 

2r 



Vl(^<sup|F|(y) 

x-r J/62J 

<mf|F|(y) + -L 

J/GJ |J|2 
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From (42) we can write 

77V / fM ^ ~ TTF / / E ( M ( F )I T ) + SU P T7T / / 

|irP ■/ MtP \J\ J j 

= -rK f E(/|T)E(M(F)|T) + sup^- / / 
\1t\ 2 Jx JeJ \ J \Jj 

-L-HEtflT)!!*. / E(M(F)|T) + sup^- / X ^ 
Mr 1 2 \ J \ Je 

E(M(F)|T) 2 



< 



< 1 + 



< sup^- / Xj 







sup 







where E(-|T) denotes the conditional expectation relative to T. Finally, note that since 
for each J E J, Ji EVt for some i, we have that 

sup — / Xj < sup — / Xi , 
JeJ \ J\ Je iev T M | 

which yields (41). This concludes the proof of Theorem 6.2. ■ 



8. Reduction to Bessel inequality 

We still have to prove Lemma 6.3. This will be achieved by means of a certain maximal 
Bessel inequality and a stopping time argument. We first recall a definition. 

Definition 8.1. [16] Let j G {1,2,... , n}. Two j-separated multitile trees (T,T, i) and 
(T',T',i) with the same index are said to be strongly j-disjoint if T D T' = 0, and 
furthermore whenever s£T,s'gT' are such that u s . C uv., then one has It H J a > = 0, 
and similarly with T and T' reversed. A collection of j-separated multitile trees is called 
mutually strongly j-disjoint if each two multitile trees in the collection are strongly j- 
disjoint. 

Remark 8.2. If two j-separated multitile trees (T, T, i) and (T',T", i) are strongly j- 
disjoint, then one has sj fl s^- = for each s G T, s' G T'. This is because if sj and s'j 
intersect, then since T fl T' = 0, we must have either ou Sj C uv. or uv. C uv>, and the 
claim then follows from the definition of strong j-disjointness. This may help explain the 
terminology "strong j-disjointness". 

The next estimate controls the extent to which disjoint trees can each absorb a lot of 
L 2 energy. It is the main technical estimate used in the proof, and the core of Lacey's 
original argument in [13]. The proof is rather difficult and will occupy the remainder of 
this paper. 

Theorem 8.1 (Maximal Bessel inequality, multitile version). Let J 7 be a finite collec- 
tion of strongly j-disjoint, j-separated multitile trees. Let ji > and suppose that N is 



MAXIMAL MULTILINEAR OPERATORS 



31 



sufficiently large depending on fi. Assume also that 

2m ^ frnEK^^)l 2 )^ 2m+1 ( 43 ) 



and 



1 



£ K/f.*.j>l : 

v ^ c/ T / y 



■ 1 s6T 



< 2 m+1 (44) 



/or eac/i T' G T G Then if ji > and iV zs sufficiently large depending on /i i/ien we 
have 

/ I p. 1 1/2 \ | 

Ri_ ■ (45) 
Te.F v ' 

1/2 — 

Remark 8.3. The factor (V^y in (45) is technical and should be ignored. Intuitively, 

the condition (43) asserts that the function fj, when "restricted" to a tree T in J 7 , has L 2 
norm roughly comparable to 2 m |/y| 1 / 2 . The strong disjointness of the trees is an assertion 
that these restrictions are in some sense "almost orthogonal" . Since fj has an L 2 norm of 
0(1), we see that (45) is indeed a kind of Bessel inequality. This estimate is standard (and 
fairly straightforward) when j ^ n, but when j = n the presence of the cutoff l|j 3 | >2 M*) hi 
the modified wave packet <p s n presents some significant difficulties (already encountered 
in [13]). 

Let us now show how Lemma 6.3 follows from Theorem 8.1. This will be a standard 
stopping time argument of the type which has been commonly used in time-frequency 
analysis, see for instance [16], but for sake of completeness we present the argument here. 

We perform the following algorithm to construct S2 and T. 

• Step 0. Initialize T to be empty, and S 2 to equal S'. 

• Step 1. If sizej(S 2 ) < 2 m , then we terminate the algorithm. Otherwise, we have 

2 m < sizej(S 2 ) < sizej(S') < 2 m+1 . 

By definition of size, we can find a j-separated multifile tree T = (T, T, i) in S 2 
obeying (44). 

• Step 2. The multifile tree (T, T, i) mentioned above is a (j, e)-separated tree 
for some e = ±1. For fixed % and e, we may assume that this tree maximizes 
the quantity e^, where ^ is the center of the frequency tile Tj, subject to the 
constraints (44) and T C S 2 . 

• Step 3. Clearly the multifile tree T is non-empty, since it has positive size. Add 
the multifile tree T to the collection J 7 , and delete the multitiles in T from S 2 . 
Note that this removes at least one multifile from S 2 . 

• Step 4. Next, define the (possibly empty) companion tree (T, T, j) where T := 
{s E S 2 : Sj < Tj}, add this tree T to T also, and delete the multitiles in T from 
S 2 . Then return to Step 1. 
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This algorithm terminates in finite time since S 2 was initially finite, and every iteration 
of the algorithm removes at least one multitile from S 2 . It is also clear that this algorithm 
will obtain a decomposition (31) obeying (33). The only remaining task is to verify the 
bound (32). It suffices to do this for each fixed 1 < i < n, thus restricting the summation 
to those trees T = (T, T, i) in T with index i. This in turn fixes the quantity e appearing 
in Step 2 above, so if one indexes the trees T in the order that they are added to T then 
e£ Tj will be non-increasing. We also only need to focus on those trees selected using Step 
3 rather than Step 4, since the trees in Step 4 have the same time interval as those in 
Step 3 and so we are only giving up a factor of 2 by doing this. 

To prove (32), it suffices by Theorem 8.1 to show that the trees in T with fixed i and e 
arising from Step 3 are mutually strongly j-disjoint. Suppose for contradiction that there 
were two trees T, T' in T of this type which were not strongly j'-disjoint. Since these trees 
have distinct multitiles by construction, the only way that strong j-disjointness can fail 
(up to swapping T and T') is if there exist s G T, s' G T' with u s j C u s 'j and Ip^L' ^ 0- 
From Lemma 5.2 we conclude that s'j < Tj and e(^Ti ~ £t") > 0. The latter condition, 
combined with the non-increasing nature of the e£r. , ensures that T was selected earlier in 
the algorithm than T'. But then s' would have been selected in the companion tree T and 
could not have remained in S 2 by the time T' was selected, a contradiction. This ensures 
the strong j-disjointness and concludes the deduction of Lemma 6.3 from Theorem 8.1. 

It remains to prove Theorem 8.1. This will occupy the remainder of the paper. 

9. GOOD-A REDUCTION 

The only remaining task in the proof of Theorem 1.1 is the maximal Bessel inequality 
in Theorem 8.1. This will be accomplished in stages. In this section we rephrase the 
inequality as an inequality concerning tiles rather than multitiles, and use some "BMO 

2 

theory" for tiles to replace the ( 1 ^ m j factor in (45) by a factor which depends instead 

on the counting function Njr. This BMO theory is quite elementary and may have some 
independent interest. 

We will focus on the hardest case j = n, in which one must deal with the presence of 
the cutoff l| /s | >2 fcM in the modified wave packet (f> s , n - The cases j ^ n are significantly 
simpler (see for instance [16]) and in any event can be handled by the argument here (e.g. 
by the artificial expedient of setting k(x) to be so low that the cutoff l|/ 3 |> 2 fco<o disappears). 

The Bessel inequality is now really only a statement about the n-tiles of the multitiles 
in S, and so we shall introduce new notation to focus only on these tiles rather than on 
the multitiles. 

Definition 9.1 (Lacunary tree). A lacunary tree T = (T,/t>£t) is a collection T of tiles, 
together with a dyadic time interval It G D and a center frequency £t G R, such that 
for all P G T we have Ip C I T and dist(u;p, £ T ) ~ C |o;p|, and such that the frequency 
interval Up of a tile is determined entirely by the length of the time interval, thus if 
P, P' G T and \Ip\ — \Ip>\ then up = up/. (In particular, this means that distinct tiles in 
T have distinct time intervals.) We say that one lacunary tree (T', It>, £t) is a subtree of 
another (T,/ T ',£t") if T' C T (thus we allow subtrees to have a different time interval 
and center than the supertree). We say that two lacunary trees (T,/ T ,£ T ), (T',7 T ',^tO 
are strongly disjoint if T n T' = 0, and whenever P G T, P' G T' are such that uo P C up>, 
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then one has I T fl I P > = 0, and similarly with T and T' reversed. We define a forest to be 
any collection T of lacunary trees such that any two distinct trees T, T' in T are strongly 
disjoint. 

Observe from Lemma 5.2 that if (T, T, i) is an n-separated multitile tree, then (T„, I Tn , ^T n ,n) 
is a lacunary tree, where T n := {s n : s G T} is the set of n-tiles of the multitile tree T, 
and £ Tn ,n is a frequency such that dist(uj Tn ,n,t,T n ,n) ~ C \uj Tn ,n\- Furthermore, if (T,T,i) 
and (T,T',i) are strongly n-disjoint, then (T„, I Tn , £ Tnin ) and (T n ,T^,^ n ) are strongly 
disjoint. Thus, we can deduce Theorem 8.1 from 

Theorem 9.1 (Maximal Bessel inequality, first reduction). Let J 7 be a forest. Let fi > 
and suppose that N is sufficiently large depending on fi. For each tile P in [J TgJC -T 7 let 
tpp be a wave packet adapted to P , and let (j)p be the function 

4>p(x) := 1\ Is \ >2 k^p{x). 

Let E be a finite union of intervals, and let f e X 2 {E) be such that 



(rnEi^^)i 2 ) 2 ^ 2m+1 



PeT 



and 



( 

V 



IT' 

Ip<Zlrpf 

for each V G T G T . Then we have 



< 2 



m+l 



2m 



^11/2 



for all fi > 0. 



We will now eliminate the role of the set E, replacing it with a certain counting function 
multiplicity, and also eliminate the role of the size parameter 2 m . More precisely, in this 
section we shall deduce Theorem 9.1 from 

Theorem 9.2 (Maximal Bessel inequality, second reduction). Let J 7 be a forest. Let 
fi > and suppose that N is sufficiently large depending on fi. Let tpp, (pp be as in 
Theorem 9.1. Let f G £ 2 (R) be such that 



< 2 



(46) 



PeT 



for each T G T , and 



I 



PGT 



< 2 



(47) 
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for each V e T. Let be the counting function 

N r := 1/t, (48) 

and let Iq be any interval which contains the support of Np. Then we have the Bessel-type 
inequality 

E \(LM\ 2 <»\\Nr\\Z~ f\f\ 2 xZ (49) 

We shall prove Theorem 9.2 in later sections. For now, we show how it implies Theorem 
8.1. The argument is similar to the "good-A" type estimates used to prove John-Nirenberg 
BMO inequalities, and to emphasize this connection (and because this theory may be of 
some independent interest) we shall proceed in a somewhat abstract manner. 

The following observation is trivial, but is still worth recording. 

Lemma 9.3 (Forest refinement). Let T be a forest. For each tree T in T , let JF T be a 

collection of subtrees ofT with disjoint time intervals. Then Utg^^t is also a forest. 

Let T be any forest. The quantity 1 1 A^f 1 1 ioo measures the maximum possible overlap of 
the time intervals It of the trees T in T . We shall introduce a closely related quantity 
||jF|| BMO , defined as 

II^IIbmo := sup — E I J t|, 
where the supremum is taken over all the dyadic intervals /. 

Remark 9.2. One can relate this BMO-type norm to the genuine (dyadic, vector-valued) 

BMO norm by the formula H^Ubmo — II-^YHIbmo' wnere '■= I^TeF ^T e T is a vector- 
valued counting function, with the e T being orthonormal vectors in an abstract Hilbert 
space. However, we will not adopt this approach since the theory of vector- valued BMO is 
not as familiar as that of ordinary BMO, preferring instead a more direct and elementary 
approach. 

It is clear that ||.F||bmo < H-^YHU 00 ; indeed 

£ E i'-i = p|/ E ^ 

< II E ll -rW L °° 

While the converse is not quite true, we do expect the L°° norm and BMO norm to be 
very close. 

Now we obtain some good lambda inequalities for the BMO norm. We first observe 
that to control the BMO norm of a collection T of trees, it suffices to control the BMO 
norm of subcollections of trees which are already controlled in L°°. 

Lemma 9.4. Let T be a forest such that 

II^'IIbmo < B 
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whenever T' C T is such that ||A^/|| L oo < 2B. Then we have 

II^Ubmo < 25. 

Proof Let I be a dyadic interval. Call a dyadic interval J C 7 /leavy if |{T G JF : JC 
-^t ^ A)} I > 2-B, and let .F' be the collection of those trees T G JF such that I T C 7 
and that It is not heavy. Then by construction we have ||jF'|| LO o < 2B, and hence by 
hypothesis H-F'Hbmo < B. In particular 

I J tI < B \ J o\- 

TeT-.i T ci ;i T not heavy 

Now we deal with the heavy intervals. If we let J denote the set of maximal dyadic 
heavy intervals, then we have 

E i'Ti<£ E i'tI 

TeF-.i T ci ;i T heavy JeJ tg^i/tCJ 

— II-^IIbmoI^I 



JeJ 
•FIIbmo 



< 

< 

< 
< 



2B 
•FIIbmo 



2B 

F BMO 



2B 
•F||bmo 

•F BMO 



25 



25 
U JeJ J 

E !'t 



^o| II-F'IIbmo 
U 



Summing these two estimates, we obtain 



and then taking supremum over I we obtain 

ii Tii << d i II-^IIbmo 

Ik IIbmo S -d H • 

The claim follows. ■ 

Similarly, to control the L l norm of N^, it suffices to control the L 1 norm of subcollec- 
tions T' which are controlled in L°° by the BMO norm of JF: 

Lemma 9.5. Let J 7 be a forest such that 

\\Nr\\ L i < A 

whenever J 7 ' C T is such that ||A^/|| L oo < ||JF|| bmo . Then we have 

\\N F \\i < 2A. 
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Proof Set B := ||JF|| bmo . As before, we call a dyadic interval J C I heavy if |{T e T : 
t/ C CJ 7 }| > 5, and let JF' be the collection of those trees T G .F such that J T C J 
and that It is not heavy. Then by construction we have || A^-'||ioo < B and hence by 
hypothesis 1 1 A^r' 1 1 i < A. Now if we let J be the collection of maximal heavy intervals, 
then we have 



|AfcvHii = £ E i J ' 

— II-^IIbmoI^I 
,/eJ 

B 

U. /eJ J 



T 



^ / ^ 

^Ujgj ^ TG.F' 

< ||^' 111 

< A 



and the claim follows. 

We can of course combine these two lemmas to obtain 
Corollary 9.6. Let T be a forest such that 

||7V>||i < A and H-F'Hbmo < B 
whenever J 7 ' C T is such that \\Njrt\\ Loo < IB. Then we have 

HTVVll! < 2A and ||.F||bmo < 25. 

A specific case of this is 
Corollary 9.7. Let J 7 be a forest such that for some /i > 1 

HAV'lli < ^11^'llL and H^Ubmo < 
/or a// JF' C JF. TTien we nave 

||A^||i S AB^ and H^Ubmo S ^fi^. 



Thus to prove a counting function estimate on ||iVjr||i, we are permitted to lose a small 
power of the ||./Vjf||£°o as long as the argument also works for all subtrees and localizes to 
a BMO version as well (with a different constant B). 

Now we can finally prove Theorem 9.1. 
Proof [of Theorem 9.1] Let T' C T be arbitrary. From Theorem 9.2 with / replaced by 
f /2 m , and I chosen to be so large as to contain all the time intervals arising from JF', we 
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have 

ii^f'IIi = i /t ' 



T&F' 



PeU Te ^'T 



s u^iiij-y i//2 m r 

<2- 2m ||iV^|l!oc 

thanks to the L 2 normalization of / G X^{E). If we let Jo be an arbitrary dyadic interval, 
then by replacing T' by {T G T 1 : It Q Iq} in the above argument we see that 



|Jt| 

Te.F':/ T C ; , 

S i^ll^'lli/ l/72 m l 2 x} ° 

< ||iV^||£ 0O 2- 2m | J B|- 1 



thanks to the uniform bound of ^ 2 on / G X 2 (E). Taking suprema over 7 we 
conclude that H.F'Hbmo ll-^ r llL°°2~ 2m |i?| _1 . Applying Corollary 9.7 we conclude that 



E |/ T | = H^lli <„ 2- 2m (2- 2m |E|- 1 )^. 
Replacing \i by /i + 1 we obtain Theorem 9.1. 



10. TlLESET REFINEMENTS 

It remains to prove Theorem 9.2. In this section we perform some additional elementary 
reductions. First we eliminate the localizing weight x}° an d we permit the deletion of those 
tiles which lie inside a small exceptional set. Then we sparsify the tile set, and remove 
some logarithmic pileups of time interval multiplicity. 

We begin with the first reduction. We assert that to prove Theorem 9.2 it suffices to 
prove the same assertion with the weight x}® not present in (49). The reason for this is 
that xj 10 is a polynomial, and because of this (and the hypothesis that all the tiles have 
time interval contained in I ) Xj^ is a wave packet adapted to P, except for the trivial 
change that the exponent of lOiV in (27) must be reduced slightly to 10(N — 1). But 
this clearly makes no essential difference to the argument since we are free to take N as 
large as we wish. Since (f,<f>p) = X^, 10( Ap)j we thus see that Theorem 9.2 with the 

localizing weight Xio follows more or less automatically from Theorem 9.2 without the 
localizing weight. 

The next step is to eliminate the hypotheses (46), (47) and also give the ability to 
delete a small exceptional collection of tiles. 
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Theorem 10.1 (Maximal Bessel inequality, third reduction). Let J 7 be a forest. Let 
fi > and suppose that N is sufficiently large depending on fi. Let ip P , (f> P be as in 
Theorem 9.1, and let be the counting function (48). Then there exists an exceptional 
set P* C Utg.p T °f ^ es w tth 

U '*l <- Tol^t (*» 

PeP* ■ - 

such that we have the Bessel-type inequality 



E k/>p)I 2 ^ii^ii^ii/ii^ (51) 

PeU Te ^T\P. 

for allfe L 2 (R). 

Proof [of Theorem 9.2 assuming Theorem 10.1] Write f2 := Upep, ^p- Then 

E \(f^p)\ 2 < E \(f^p)\ 2 - 

Pe\J TeT T:ip£n Pe(J Te ^ T \ p * 
To prove (49), it thus suffices in view of (51) to show that 



P£{J TeT T:i P cn PeUxe^T 



From (46), it thus suffices to show that 

E \(fAp)? <\\\NAW- 

Pe{J Te:F T:ipcn 

For each tree T in J 7 , consider the tile set {P G T : Ip C Q}. If Q is any tile in this set 
with Iq maximal with respect to set inclusion, then Iq C Q and from (47) we have 

E |(/,0p)| 2 <4|/ q |. 
PeT:ipCi Q cn 

Summing this over all such Q (noting that the Iq are disjoint by dyadicity and maximality) 
we conclude 

E K/,0p)l 2 <4|/ T nfi|=4 I i lT . 

PeT-.ipcn J Q 
Summing this over all T G T we obtain 



P^[j Te:F T:IpCQ 



(/,0p)| 2 <4 / NF<4\n\\\Nr\\ L o 



and the claim follows from (50). ■ 

We still have to prove Theorem 10.1. The next step will be to sparsify the collection of 
tiles. Recall the three dyadic grids T>q, T>\, T>2 from (24). One can easily verify that for 
every interval J (not necessarily dyadic) there exists a d G {0, 1,2} and a shifted dyadic 
interval J' G V d such that J C J' C 3J; we will say that J is d-regular. 

Let A > 1, and let d G {0,1,2}. We shall say that a collection of J C V of time 
intervals is (A, d) -sparse if we have the following properties: 
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(i) If/,/' G X are such that |7| > |7'|, then |7| > 2 10tM |7'|. 

(ii) If 1,1' & I are such that |7| = |7'| and I ^ 7', then dist(J, 7') > 100A|7'|. 

(iii) If 7 G X, then AI is <i-regular, thus there exists an interval I a G T>d such that 
AI C 7^ C 3A7. We refer to 7^ as the A- enlargement of 7. 

If X is an (A, <i)-sparse set of time intervals and P is a tile whose time interval Ip lies in 
X, we write 7p^ for the A-enlargement of Ip. Similarly if T is a tree whose time interval 
It lies in X, we write It, a for the A-enlargement of It- 

To prove Theorem 10.1, it suffices to prove a variant for (A, <i)-sparse sets of tiles. More 
precisely, we can reduce to 

Theorem 10.2 (Maximal Bessel inequality, fourth reduction). Let A,D,u > 1, and 
suppose that N is sufficiently large depending on v. Let T be a forest with ||A^r|| L oo < D. 
Let P := |J TeJ)r T 7 and suppose that the time intervals 

{Ip : P G P} U {7 T : T G T} 

are (A,d)-sparse. Letipp, (fi P be as in Theorem 9.1. Then there exists an exceptional set 
P* C Ute^ T °f ^ es w tth 

i U ip\ <A*- v +d- v )Y,\ i -*\ ^ 

PeP, TeJ 7 
such that we have the Bessel-type inequality 

E I </> <M I 2 ^ ( lQ g( 2 + AD )) W + A 10 -D™) ||/||| 2 
psp\p, 

/or a// / G L 2 (R). 

Proof [of Theorem 10.1 assuming Theorem 10.2] Let A, v be chosen later, and set D : = 
||AV||l°=- We need the following lemma: 

Lemma 10.3 (Sparsification). LetT be a collection of time intervals. Then we can split 
X = Xi U . . . U 1l with L = 0(A 2 ) such that each T\ for 1 < / < L is (A, d)-sparse for 
some d — 0, 1, 2. 

Proof By pigeonholing the scale parameter into cosets of 100AZ, we can partition X 
into 100A subcollections, such that on each subcollection we have the scale separation 
property (i) from the definition of (A, <i)-sparseness. Similarly if we partition the position 
parameter at each fixed scale into cosets of 100 A, we can partition further into (100A) 2 
subcollections on which we also have the position separation property (ii). Finally, we 
make the elementary observation that for each dyadic 7 G Do there exists d — 0, 1, 2 such 
that there exists Ia G T>a with AI C I A C 3AI. A final pigeonholing based on the d 
parameter concludes the claim. ■ 

We apply this lemma to the set X := {Ip : P G P} to split X into Xi, ... , X^ for some 
L = 0(A 2 ). Then we have P = P x U . . . U P L , where P ; := {P G P : I P G T{\. Observe 
that T is a lacunary tree in J 7 , then T fl P; is also a lacunary tree. The time interval It 
of this tree need not lie in X;, however one can partition T n P; into subtrees with this 
property. More precisely, if we let 7 be any interval in {Ip : P G TflP/} which is maximal 
with respect to set inclusion, then ({P G T n P^ : Ip C 7}, 7, ^ T ) is a lacunary tree whose 
the time interval 7 also lies in X/. Let T\ be the collection of all the trees obtained in 
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this manner for fixed I, as T varies over T and I varies over the maximal intervals in 
{Ip :PgTH P^}, thus P/ = Ute^ Since for each fixed T the intervals / are disjoint, 
one easily verifies the pointwise estimate < Nj?, and hence || L oo < D. Applying 
Theorem 10.2 (if N is large depending on u), one can then obtain an exceptional set 
P;,* C Pi obeying (52) such that 

E i(/>p)rsaog(i+^)) io +A io -^ io )ii/ni 2 . 

Setting P* := Ukkl we ^ nus conclude 

I U Ip\<»a 2 (a-» + d-»)J2\It\ 



PeP* TeJ 7 



and 



E l</» ^)! 2 S ^(M2 + AD)) 10 + A 10 -^ 10 )||/||i 2 . 
psp\p* 

If we then set v := 100 + 400/i and A := C^D 1 ^^ for a large constant C M we obtain the 
claim. ■ 

The hypothesis in Theorem 10.2 is currently assuming some control on the quantity 
||AV||l°° = || X^Tejc IU°°- I* 1 the arguments which follow, it is more convenient to 
assume control on the larger quantity || X^Te^Ml/TlU 00 ' where of course M is the Hardy- 
Littlewood maximal function. It is not necessarily the case that control of the former 
implies control of the latter, due to "logarithmic pile-ups" such as those where the intervals 
I T are lacunary around a fixed origin; this is also related to the failure of the Fefferman- 
Stein vector- valued maximal inequality [9] at this endpoint. Nevertheless, by removing all 
the tiles in a small set it is possible to control the latter from the former. More precisely, 
we have 

Lemma 10.4. Let X be a finite set of intervals in T>d for some d = 0,1,2 such that 
II Yliei — D for some D. Then X can be split into two collections I = l'ul b such 

that 

nE( M1 ') 2 iu~ s D " 

and 

i U J i s^Ei'i- ( 53 ) 

/ex b P^ 

Proof See [13, Lemma 3.14]. ■ 
As a consequence, we can reduce Theorem 10.2 to 

Theorem 10.5 (Maximal Bessel inequality, fifth reduction). Let A, M,u > 1, and suppose 
that N is sufficiently large depending on v. Let T be a forest with 

|| E Mi/tIU°° < M. (54) 

Let P := |J TeJ(r T ; and suppose that the time intervals 

{Ip : P G P} U {/t : T G T} 
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are (A, d) -sparse. Suppose also that we have the technical condition 

sup dist(rr, dI T ) > A~ V \I T \ (55) 

for all P G P and T G T (this ensures that tiles do not cluster near the edges of trees). 
Let ip P , (ftp be as in Theorem 9.1. Then we have the Bessel-type inequality 

EK/'^)| 2 ^( lo g(2 + AM)) 10 + ^M 2 )||/||i 2 (56) 
PeP 

for allfe L 2 (R). 

Proof [of Theorem 10.2 assuming Theorem 10.2] Apply Lemma 10.4 to the collection 
X := {/ T : T G J 7 } to create the partition X = X" U X b with the desired properties. Set 

P* := |J TU (J {P G P : sup dist(x, 8I T ) < A~ V \I T \}. 

Observe that 

(J Ip Q [J I U |J {a; G Jp : dist(x, 9J T ) < A~ U \I T \} 
PeP* /gx b Te.P 

and hence by (53) 

U Ip <* (D-» + A~") J2 \ J t\- 
PeP, t&i t 
Now since the intervals It have multiplicity at most D, we have 

II Yl Mi^iUo^rjH^Mi.iUocSrj 4 - 

Applying Theorem 10.2 with M <^> v (and all the trees with spatial interval in X" have 
been completely removed from P\P*) we obtain 

E i</» <mi 2 s ( lo s( 2 + AD ^ W + ^-"^n/iii- 

psp\p, 

and the claim follows. ■ 

It remains to prove Theorem 10.5. We may dualize (56), observing that it is equivalent 
to the estimate 

II J2a p( f> P \\h S (log(2 + AM)) 10 + A 1 - 1 'M 2 )||a||, 2 
PeP 

for any sequence a = (ap)p e p of complex numbers. By definition of <f>p, it thus suffices 
to show the maximal Bessel-type inequality 

|| sup | £ a P ^ P \\\ L , <„ (log(2 + AM)) 10 + A 1 ^M 2 )||a|| z2 . (57) 

k PeP:|/ P |>2 fe 

At this point we shall pause to sketch the general strategy we shall employ to prove (57), 
following [13]. First we shall split the tile set P into layers P = Pi U . . . U Pj. Roughly 
speaking, the idea is to arrange these layers so that the time intervals of Pj/ tend to be 
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(locally) wider than those of Pj when f < j. Since ip P is essentially concentrated in I P 
(or more accurately Ip >A ), this heuristically gives rise to an estimate of the form 

| sup | ap4>p\ < sup[| ap4>p\ + sup | op^pl]- 

To deal with the former expression we shall use the Radamacher-Menshov inequality and a 
non-maximal Bessel inequality (which is essentially (57) without the supremum in k, and 
with somewhat fewer logarithmic losses on the right-hand side). To deal with the second 
term we replace the supremum in j by a square function, and reduce to controlling the 
contribution of a localized expression over a single generation Pj (which will ultimately 
reduce to a certain maximal inequality of Bourgain [4]). 

For technical reasons it turns out that one needs to treat the "boundary" of the layers 
Pj separately from the rest of the Pj, in order to improve the separation properties 
between layers. As such we will have to execute the above strategy twice, once for the 
boundary tiles and once for the interior tiles. 

We now turn to the details, beginning with the selection of the layers. Introduce the 
sets 1 C T> and 1 A C T>d by 

J := {/ T : T G F}- 1 A := {I T ,A :T£f}. 

Observe that the (A, <i)-sparseness of 1 ensures that the map / i— > I A is a bijection from 
1 to 1a which preserves the set inclusion relation. Since AI T C I T A C 3AI T we see that 

1t,a C 1(L41 t 
and hence by (54) we have the multiplicity bound 

ii Yl - ioAM - ( 58 ) 

We then partition 

i A =i A 1] uifu...uif AM) 

recursively by defining Ijp to be those intervals in 1 A \ Ui<j which are maximal with 

respect to set inclusion, thus 1^ is a collection of disjoint intervals in the dyadic grid 

T>d- Observe that for 1 < j < 10AM, each interval in 1^ is contained in exactly one 

interval in 1 A ~^; since || X^/ez I U°° — 10AM, we conclude that 1 A \ ■ ■ ■ ,1 A WAM ^ do 
indeed partition 1 A . Using the bijection between 1 and 1 A , we thus induce a partition 
1 = J« U...U J(ioam) of j 

Let 1 < j < 10AM. For each / e 1^\ let P/ denote the tiles with time interval /: 

P, := {P G P : I P = I}. 

Observe that each tree T in T contributes at most one tile to Pj, by definition of a 
lacunary tree, and if T does contribute a tile then 1 I < 1 T . By (54) we thus have 

#P 7 < M for all I el (59) 
We also introduce the tileset P</ for / G 1^ by 

P <7 := {P eP : I P C I;I P £ J for all J G [j J W }; 

i>j 
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thus P</ is the collection of tiles whose time interval is strictly contained in the interval 
/ G but is not contained in any interval from a later layer of X. Since every tile 
P G P has its time interval Ip contained in some interval in X (because P is contained in 
some tree Te J, and hence Ip C J T g X) we see that we have the partition 

p = U U( p ' UP <')- 

\<j<U)AM i e iU) 

To prove (57), it thus suffices by the triangle inequality to prove the estimates 

((log(2 + AM)) 10 + A x ~ v M 2 ) 



SU P I E E 



E 

JgZO) PeP/:|/p|>2 fc 



U 2 <^ 



a (3. 



(60) 



and 



su PlE E E ap ^ p ' 

j /gj(j) PGP </ :|/ P |>2'= 



| L2 S ((log(2 + M)) 1U + A l - v M*)\\a\\t>. 



(61) 

The estimate (60) is easier and is proven in Section 11. The estimate (61) is more 
difficult, relying in particular on a certain inequality of Bourgain, and is proven in Section 
12. To conclude this section, we present two tools which will be used to prove both (60) 
and (61). The first is a non-maximal Bessel inequality, and more precisely the bound 



^apM\h <log(l + M)||a||p. 



(62) 



PeP 



This inequality may be of some independent interest and is proven in Section 13. Secondly, 
we will rely on the following form of the standard Radamacher-Menshov inequality, whose 
proof we include for sake of completeness. We observe first the trivial bound 



8UP|/i|||L><||(El/<| 2 ) 1/2 |l^ = (EH/* 



2 \l/2 
L2j 



(63) 



valid for any finite collection of L 2 functions f{. This bound is usually too crude for 
applications, as the summation in i usually creates an undesirable polynomial loss in the 
estimates, however one can refine this polynomial loss to a logarithmic loss in the following 

way. 

Theorem 10.6 (Radamacher-Menshov). Let {fi)f =1 be a sequence of functions in £ 2 (R) 
which are almost orthogonal in the sense that there exists a constant B, such that for each 
finite sequence e±, . . . G { — 1, +1} of signs we have 



L 

E 

i=i 



Clfl 



< B. 



L 2 



Then we have the maximal inequality 



sup 

L'<L 



i=i 



<B\og(2 + L). 



Proof We may take the f\ to be real-valued. By adding dummy fi if necessary, we 
may assume that L = 2 m for some integer m > 1. For each set / C {1, ... ,L} let 
// := Xwe/ fi- F° r eac h < m' < m let X m > denote the collection of sets of the form 
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{2 m 'j + 1, . . . , 2 m 'j + 2 m '} for j = 0, . . . , 2 m ~ m ' - 1. For each fixed m', the sets in l m , 
partition {1, . . . , L}, and thus by hypothesis we have 

ii E ^ B 

for all signs e/ = ±1. If we square this inequality we obtain 

En/* + E ^Afij.i)<B 2 . 

lel m , I,JeJ m r.I^J 

If we then set ej to be independent random signs and take expectations, we conclude 

E WMh ^ 

By (63) this implies that 

l| SUP | £2 < 5. 

By representing L' in binary and using the triangle inequality we have the pointwise 
estimate 

L' 

iE/<i< E su p im 

2=1 0<m'<m /e:rm ' 

for all L' < L. Taking suprema over all L', taking L 2 norms, and applying the triangle 
inequality, the claim follows. ■ 

11. Proof of (60) 

We first prove the estimate (60), which is relatively easy, and serves as a model for the 
more complicated estimate (61). 

Intuitively, the contribution of the wave packets ipp for P G P/ should be localized to 
the time interval I a- To exploit this we introduce the tail error 

E E E \"p\\M*)\- 

j Ielti):xgI A P£Pl 

This error is small: 
Lemma 11.1 (Tail estimate). We have 

\\E\\ L 2 <„ A-'M^WaWp. 
Proof From (27) one easily verifies the pointwise estimates 

i^pI < i/r i/2 Mi 7 

and the L 1 bound 

III^Ki-i/JIU^^ 10 ^ 10 !/! 172 

whenever P G Pj. The former bound and (54) implies the estimate 

II E E E \I\ 1/2 \a P \\Mx)\\\L~ < M\\a\\ lac 
j iezU):x$i A PePi 
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while the latter bound and the triangle inequality implies the bound 

II E E El 7 !" 172 !^!^)!^^^ 10 ^ 10 !!^^ 

3 ieiti)-.xgi A PePi 

The claim then follows from interpolation (or from Cauchy-Schwarz), since we assume N 
sufficiently large depending on v. ■ 

To exploit this tail estimate we use the following pointwise inequality: 

Lemma 11.2. For almost every x we have 

su PlE E E a P^p( x )\ < su Pl E E E a P^p( x )\ + E ( X )- 

k 3 7eX0) P6P J :|7 P |>2'= jo j<jo IeXV) PePj 

Proof We may assume that x is not the endpoint of any dyadic interval. It suffices to 
show that for every k and x there exists a jo such that 

i E E E a ^ x )\ s i E E E a ^ x )\ + °(^))- 

3 ieiti) PeP/:|/p|>2 fc 3<30 PePi 

Since Ip = I, we can write the left-hand side as 

'E E E "^^i 1 

3 IeX(3):\I\>2 k PeP/ 

By definition of E and the triangle inequality, we can bound this by 

IE E ^ap^ P (x)\+0(E(x)). 

3 l£j(i):\I\>2 k ;x£l A PGPj 

For each 1 < j < 10AM, we know that there is at most one interval Ij G I^) whose 
dilate I^a contains x, and furthermore these intervals are decreasing in j (adopting the 
convention that Ij = if no interval in T^) contains x) . Thus if we let j be the largest j 
for which \I jo \ > 2 k< - x ^ (with j = if no such j exists), then we see that if 1 < j < 10AM 
and / G are such that x G I a, then \I\ > 2 h ^ if and only if j < j(x). Thus we can 
bound the preceding expression by 

IE E J2 a rM*)\+0(E(x)). 

3<30 iei(ti:xei A P&F I 

One can then remove the constraint x G I a by definition of E(x) and the triangle inequal- 
ity. ■ 

In light of the above two lemmas, we see that to prove (60) it suffices to show that 

H su PlEE E «p^p|||l 2 <(log(2 + AM)) 10 ||a|| P . (64) 
jo j<jo ieiU) PePj 

Applying the Radamacher-Menshov inequality (Theorem 10.6), it suffices to show that 

HE e ; E E ap^p||L 2 <(log(2 + AM)) 10 ||a|| /2 . 
j ieiU) PePj 

for all choices ei, . . . , £ioam G {— 1, +1} of signs. But this follows from the non- maximal 
Bessel inequality (62) (with some room to spare), since the Pj are disjoint in P. This 
concludes the proof of (60). 
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12. Proof of (61) 

Now we prove (61). We shall argue as in the proof of (60), although the details shall 
be more technical, and we shall also rely crucially on a maximal inequality of Bourgain. 

In the previous section we localized the contribution of Pj to the interval I a- It turns 
out (because of the (A, <i)-sparseness hypothesis) that the contribution of P<j can be 
localized even further, to the interval / itself. To formalize this we again introduce a tail 
error 

E E \«p\\M*)\- 

Lemma 12.1 (Tail estimate). We have 

\\E\\ L 2 S A^M^aWv 

Proof Since there are only 10AM values of j, it suffices by the triangle inequality to 
show that 

|| Yl E \ap\\Mx)\\\i^<uA- v M\\a\\ P (65) 

for each j, which we now fix. Suppose for the moment that we could show the pointwise 
estimate 

E MHp{^)\<u A^Mc^I^Mhixf (66) 

for each / and x ^ I, where cj := (X^pgp <7 l a p| 2 )^ 2 - Then the left-hand side of (65) is 
bounded by 

< V A' V M\\ c.l/r^Ml.Or) 2 !^. 

Applying the Fefferman- Stein maximal inequality [9], which among other things asserts 
that 

iiE m / 2 iu 2 = ii(E M / 2 ) 1/2 iii 4 < ikEi/*i 2 ) 1/2 i^ = nEi/*i 2 iU 2 ' 

i i i i 

we can bound the left-hand side of (65) by 

76X0) 

Since the intervals in are disjoint, this expression is bounded by 

A-"M( \°i\ 2 ) 1/2 ^ A- u M\\a\\r2 

as desired. 

It remains to prove (66). By Cauchy-Schwarz it suffices to verify the estimates 

E \Ip\\M^)\<M 2 \I\ 1 ' 2 (67) 



PeP 
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and 

|ap| 2 |/p|" 1 |^p(a:)IS^" 2, '^l^l" 3/2 Ml/(a;) 4 . (68) 



PeP 



To prove (67), we break P<j up into P</ U T, where T ranges over the forest T . Observe 
that P</ U T is empty unless / C J T ; by (54) we thus see that there are at most M trees 
T for which P^UT is non-empty. Thus it suffices to show that 

E l'plhM*)l<|/| 1/2 - 

PeP</nT 

But from (27) we have |J P ||^ P (a;)| < | ip | 1/2 (Ml /p (a;)) 100 (say). Since the I P are dyadic 
subintervals of I and each interval can occur at most once in T, the claim follows. 

It remains to prove (68). From the definition of cj and the triangle inequality it suffices 
to prove that 

IIpHMx)] S A- 2 »\I\-V 2 Mh(x) 4 

for each P e P</ and x £ I. But from the (A, <i)-sparseness hypothesis we see that 
Ip C I and \Ip\ < 2~ 100j4 |i|, while from (55) (recalling that I is the time interval of 
some tree T) we have sup x€lp dist(x, dl) > A~ U \I\. The claim now follows from (27), 
the exponential gain of |/|/|ip] > 2 100A being more than sufficient to compensate for any 
polynomial losses in A or in |7|/|7p|. ■ 

The analog of Lemma 11.2 is 

Lemma 12.2. For almost every x we have 



su v\Y Y Y «p^p(^) i < sup i e Y Y ap ^ p{ 

k 3 I eld) PeP</:|/p|>2 fc j ° j<30 76X0) P6P</ 



X 



+ sup sup I a P ip P {x) 

PeP <J :|7 P |>2fe 



lei k 



+ E{x). 

Proof We again may assume that x is not the endpoint of a dyadic interval. We fix k; 
it would suffice to find a jo and an I e Z such that 

\YY Y apM*)\<\YY Y *pMx)\ 

j /exO) PeP</:|/ P |>2* j<jo 7ex0) PeP</ 

+ | £ a P ^ P (:r)| (69) 

P6P< /o :|/p|>2fc 

+ 0(E(x)). 

By definition of E(x), we have 

\Y Y Y apM*)\<\Y Y Y a P M*)\+0(E(x)). 

j 7gX0) PgP</:|7p|>2 fe j 76X0);x67 PeP</:|7p|>2 fc 

Let jo be the largest j for which there exists an interval in X^°> which contains x and has 
length greater than 2 k . There is only one such interval; call it I . We can thus estimate 
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the contribution of the j = j term by (69), and reduce to showing that 

IE E E M>p(*)I<|£ E E ap^ P (x)\+0(E(x)). 

3<h ieTU)-,xei PeP < i-.\i P \>2 k j'<jo JexO) -PeP</ 

But if I e and x E I then 7 and Jo overlap. Since Jo belongs to a later layer Z^°) 
than I we must have I C /, and thus |/| > |/ | > 2 fc . Hence the constraint \Ip\ > 2 k is 
redundant and can be removed. The claim now follows from the triangle inequality. ■ 

In light of the above two lemmas, to prove (61) it would suffice to show that 



sup 

30 



EE E «P^p|||L 2 <(log(2 + AM)) 10 ||a|| /2 
j<jo iezU) PeP <I 



and 

||supsup| E a p^fj p{x)\\\ L 2 <(log{2 + AM)) 10 \\a\lp. 

IeI k P&P <r .\I P \>2 k 

The first inequality is proven in exactly the same way as (64) and is omitted, so we now 
turn to the second inequality. By (63) it would suffice to show that 

(E II ^p | E a P Mx)\\\h) 1/2 < (log(2 + AM)) 10 ||a|| /2 , 
/ex k PeP <r .\i P \>2 k 

which in turn would follow from the estimate 

|| sup | E a P Mx)\\\^<(\og(2 + AM)r( E MY /2 
k PeP <r -\ip\>2 k pgp<j 

for each fixed I. 

Let Ti, T2, . . . , Tj be all the trees in JF which intersect P<j; the time interval of such 
trees must contain I, and so from (54) we have J < M. We can then write 

j 

E apipp(x) = E E ap^p(x). 

PeP < r.\I P \>2 k j=l PeP < ir\T j :\I P \>2 k 

Let £1, . . . ,0 be the base frequencies of T 1; . . . ,Tj. Since Tj is a lacunary tree, we 
see that if P G P<j fl Tj then ip P has Fourier support in an interval of width |/p| _1 
and distance ~ Cq\I p \~ 1 from £j. By the strong disjointness of the Tj we see that these 
intervals must be disjoint. This implies that 

j j 

E E ap^p(x) = n fe E E a p^p( x ) 

3=1 PeP</nT J :|/ P |>2'= 3=1 PeP< / nT J 

where 11^ is a Fourier projection to the union of J intervals centered at £1, . . . , £j, each of 
radius ~ C\)2~ fc . We now invoke a deep maximal inequality of Bourgain [4, Lemma 4.11], 
which asserts in our notation that 

||sup|n fe /||| L2 <log(2 + J) 2 ||/|| L2 . 

k 
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Using this, we reduce to showing that 

3=1 PeP </ nT J PeP <7 

But this follows from the non-maximal Bessel inequality (62). This concludes the proof 
of (61). 

13. Proof of (62) 

We now prove (62). We shall in fact prove the slightly more general statement, which 
may have some independent interest: 

Proposition 13.1 (Nonmaximal Bessel inequality). Let J 7 be a forest, letP := |J Tg:F T, 

and for each tile P G P let ipp be a wave packet adapted to P. Suppose also that 
II ^TeF 1t||l°° < M. Then we have 

II J2 ap ^ L2 ;$iog(2 + M)|H| P 

PeP 

for any sequence a = (ap)p g p of complex numbers. 

Remark 13.1. By duality and the TT* method, this inequality is also equivalent to the 
assertion that 

(El^^)| 2 ) 1/2 ^ lo s(2 + M)||/|| L2 
PeP 

or that 

^(/^p^pIIl* <log(2 + M) 2 ||/|| L2 
PeP 

for all / G L 2 . The logarithmic loss can probably be lowered to log(2 + M) 1 / 2 but cannot 
be removed entirely; see [3]. 

We prove Proposition 13.1 in stages. The most important step is to establish a restricted 
version of the proposition without the logarithmic loss in M. 

Proposition 13.2 (Restricted Bessel inequality). Let J 7 , P,ipp,M be as in Proposition 

13.1. Suppose that a = (ap)p e p obeys the Carleson condition XlpeT' l a -p| 2 ~ 2 2m |/T'| for 
all T G T and all subtrees T' of T, where m is a fixed integer. Then we have 

||^ap^p|| L2 <2 m (^|/ T |)^. 
PeP TeF 

Proof See [18, Lemma 6.6]. The main idea is to square both sides, use standard estimates 
on the inner products \(ipp,ipQ)\, and exploit the strong disjointness of the trees T in the 
forest T . ■ 

Next, we establish restricted L p type estimates with a polynomial loss in M. 

Proposition 13.3 (Crude Bessel inequality). Let JF, P,-^ P , M,m,a be as in Proposition 

13.2. Then for any 1 < p < oo we have 

||^a P ^p|| LP < p 2 m M(J2 \It\) 1/p . 
PeP TeF 
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Remark 13.2. One can improve the factor of M here by interpolation with Proposition 
13.2, and at the endpoint p — 1 one can remove the loss in M entirely. However for our 
purposes any polynomial factor in M will suffice. 

Proof First observe that we can partition the forest T into forests T\ U . . . U Tm, with 
each Tj having multiplicity one in the sense that || XItgj 17 1t||l°° < 1- Indeed one could 
set Tm to be a maximal collection of trees in T whose time intervals are distinct and are 
maximal with respect to set inclusion, remove Tm from T (dropping the multiplicity by 
1), and induct; we leave the details to the reader. From the triangle inequality we see 
that it thus suffices to verify the claim when M — 1. We may also normalize m — 0. 

Let T be a tree in T. We can partition the dyadic interval It into four equally sized 
dyadic sub-intervals ir,i, It, 2, It, 3, It,a, from left to right. Let T^,T r C T be the trees 
:= {P G T : I P C 7 T ,i} and T r := {P G T : I P C I TA } with spatial intervals 
ir,i an d ^t,4 respectively, and let JF' be the forest formed by these trees and T r , and 
P' := {J Te:F i T. Observe that this forest also has multiplicity one, and that YIt^f' l-^ T l = 
|^ TeJC -|/ T |. It thus suffices by the obvious recursion argument to prove the Bessel 
inequality with P replaced by P\P' (conceding a factor of ~ p in the implicit 

constant). The practical upshot of this reduction is that for any tree T in the forest JF, 
we may assume without loss of generality that none of the tiles in T have time interval 
contained in the left quarter It,/ or right quarter ir,r of the tree. 

From the Carleson condition we have the crude bound \a P \ < l/p] 1 ^ 2 for all P G P. 
From this, (27), and the above reduction on the trees T one easily verifies the pointwise 

(l-l /T (x))|^a P ^(x)|<Ml /T ( a ;) 10 
PeT 

(say) for all T G T and x G R. From the Fefferman- Stein maximal inequality [9] and the 
multiplicity one nature of T we thus have 

ii £(i - i /T (x))i «w>p(*)iik, < ii £ Mi lT ( X r\\ LP < p i j tD i/p 

tgjf pgt TeJ 7 TeJ 7 

and hence by the triangle inequality it will suffice to show that 

H5> t |5>pVp|||l, < p (E I j tD 1/p - 

TeJ 7 PeT TeJ 7 
From the disjointness of the intervals I T it thus suffices to show that 

ii Yl a p^p\\^(iT) s i j tI 1/p 

PeT 

for each tree T. By shifting the frequency dyadic grid if necessary we may assume that 
£t = 0; this essentially turns the wave packets ip P into wavelets. The Carleson condition 
on the a P and standard almost orthogonality estimates then give the L 2 estimate 



a P 4> P \\ L 2 < \I T \ 1/2 

PeT 

and the BMO estimate 

|| ^2 Op^P || BMO < 1 

PeT 
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from which the claim follows by the John-Nirenberg inequality. ■ 

The idea is now to combine the above two propositions via some sort of real interpolation 
method to obtain Proposition 13.1. It may well be possible to use one of the existing real 
interpolation theorems in the literature to obtain this conclusion, but we will use a more 
explicit argument, based on the following decomposition of an arbitrary I 2 sequence a into 
Carleson sequences. 

Lemma 13.4 (Stopping time algorithm). Let JF, P,ip P , M be as in Proposition 13.1. 
Suppose that a = (a P ) P( zp obeys the Carleson condition EpeT' \ a p\ 2 — 2 2m |/T'| for all 
T G T and all subtrees T' of T, where m is a fixed integer. Then we can partition 
P = Utsjci T U Utgjs ^> where T\ is a collection of subtrees of trees in T such that 

2 2m E I 1 -! ~ E E i^i 2 ( 7 °) 

Te^i Tefi PeT 

and JF 2 is a collection of subtrees of trees inT such that EpeT" |ap| 2 < 2 2(m_1 )|i T /| for all 
T G .7-2 and all subtrees T' o/T. Furthermore we have \\ ^2 Te:F . lj T ||z,°° < M for j = 1, 2. 

Proof It suffices to establish this lemma in the case when the forest JF consists of a single 
tree, T = {T}, with M — 1, since the general case then follows by applying the lemma 
to each tree separately and taking unions, using Lemma 9.3, as well as the observation 
that the contribution to X^TeJ 17 -^t arising from a single tree T in T will be bounded 
pointwise by 1t - 

Let X be the set of all dyadic intervals / in J T such that ^p GT ./ pC / I°p| 2 > 2 2 ( m-1 )|/|, 
and such that / is maximal with respect to set inclusion among all such intervals with 
the property; thus the intervals in X are disjoint and lie inside It- We then let T\ be 
the forest consisting of trees T/ = (TV, I, £p) of the form T/ := {P G T : I P C /}, 
where I ranges over X. By construction it is clear that T\ is indeed a forest, and that 
Epgt' I°p| 2 ~ 2 2m |/ T '| for all T' G T\\ summing over all T' we obtain (70). If we let T 2 
consist of the single tree T 2 = (T 2 ,/ T ,^ T ) consisting of all the tiles not covered by T\, 
thus T 2 := T\lJ T/gJPi T', then we see from construction that Ep e T' l a -p| 2 — 2 2 ( m ~ 1 )|/ T '| 
for all subtrees T' of T 2 . The claim follows. ■ 

Iterating this lemma in the usual manner, starting with m extremely large and exploit- 
ing the fact that the forest T contains only finitely many tiles, we obtain 

Corollary 13.5 (Iterated stopping time algorithm). Let JF, P, ip P) M be as in Proposition 
13.1. Then there exist forests T m for each integer m, together with a tile set 'P_ OOJ such 
that we have the partition 

P = |J |J TUP-oo, 

such that we have the Carleson condition Epst" l a -p| 2 — 2 2m |Xr'| for all m, all T G T m 
and all subtrees T' of T, we have the bound 

E 22m E I'tI-Em 2 ( 71 ) 

m TeF m PeP 
and such that a P = for all P G P-oo- Finally we have \\ J^TeF ^tIU°° — M for all m. 
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Of course, all but finitely many of the T m will be empty. 

We can now prove Proposition 13.1. We apply Corollary 13.5. The tiles in P_oo yield 
no contribution and can be discarded. We reduce to establishing that 

||^F m || L2 <log(2 + M)||a|| P 

m 

where F m := X^Te^" m SpeT a -P^- ^ we ^ ^ ^ e the first integer larger than 1001og(2 + 
M), it suffices by the triangle inequality to show that 

II E F m \\ L 2 < ||a||ja 

m:m=l mod L 

for all residue classes I mod L. Squaring this and using symmetry it suffices to show that 



E n^ni^+ E 1(^,^)1 < 

m:m=l mod L m,m' :m,m'=l mod L;m'>m 



\a\\ 2 2 . 



Note that if m,m f — I mod L and m! > m then m! > m + L. Introduce the quantities 
A m := 2" 2m ^ TgJCm |/ T |; from (71) it suffices to show that 

Ell F ™Hi 2+ E \(F m ,F m ,)\<J2Am. 

m m,m' :m , >m-\-L m 

1 /2 

From (13.3) we have ||-F m ||L 2 ^ Am , and so we reduce to showing that 

E i(^,F m ,)i<E^- 

m,m':m'>m+L m 

We now use Proposition 13.3 to obtain 

||F m || L 4 < 2 m M{ E I^t|) 1/4 = 2 m / 2 MA]^ 

and 

ll^'lUva < 2™'M( E I^T|) 3/4 = 2— '/2 MA 3/4 

and hence by Holder's inequality 

\(F m ,F m ,)\ < 2-^'^/ 2 M 2 A]i A Alif < 2-( m '- m ^M 2 (A m + A m ,). 
Summing this and using the geometric series formula we conclude 

E \(F m ,F m/ )\<2- L ^M 2 J2 A m 

m,m':m'>m+L m 

and the claim follows from the definition of L. This concludes the proof of Proposition 
13.1, and (62) follows. The proof of Theorem 1.1 (and hence Corollary 1.2) is now 
complete. ■ 
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14. Appendix: a correspondence principle 

The purpose of this appendix is to prove the following correspondence principle. 

Proposition 14.1. Let A be an (n — 1) x m matrix with integer entries. In addition to 
the operators T^ R andT^ x defined in (3) and (4), we introduce the operator T\ z defined 
on functions (pi : Z — > R of compact support 

n— 1 m 

TJ >z (0i,... ,0„-i)(/) := sup— — — IIl^ + XX^)!- 

iV>0 1 ' |ni|,...,|n m |<JVi=l j=l 

Let 1 < pi, . . . ,p n -i < oo and p' n be such that l/p 1 + . . . + = l/p' n - Then the 

following claims are equivalent. 

(i) TJ )R maps L Pl (R) x . . . x L^-^R) to L P «(R). 

(ii) T;' z maps / Pl (Z) x ... x /^(Z) to Z P -(Z). 

(hi) For every dynamical system X, T^ x maps L Pl (X) x . . . x L Pn - 1 (X) to L P ™(X) ; 
wrf/i a bound uniform in X. 

Proof We first show that (i) implies (ii). Let (f>i, . . . , (p n -i '■ Z — > R have finite support. 
For each such (pi define fi : R — * R in such a way that /i(x) = <f>i(l) if £ G [I — |, I + |] 
for some / G Z, and otherwise. Note that for each x G [Z — |, I + |] and AT > 1 

~y n—l m 

1 X! 1 1 ■ X!"' •"■ ] <: 



(2N+1)™ ^ uirn-^^i^ 
v 7 |m|,...,|n m |<JVi=l j=l 



„ n—l m 

+ •/|ti| > ...,|t m |<AH-i£ = 7 ~ 



(2AT + ± r - j| tl |,...,| tm |<jv+i- = - — 
<TJ(/i,... ,/„-i)(x). 

From the hypothesis (i) we thus conclude (ii). 

Now we show that (ii) implies (i). Without loss of generality we may take fi, ■ ■ ■ , f n -i 
to be smooth, positive and compactly supported. Approximating an integral by the 
Riemann sum, we obtain 

||7a,r(/i, • • • )/n-l)|liP;(R) = 
j n—l m 

= hme-^H sup ^ JI^^^ + 5>j"j))IU. ( z)- 

>U V 7 |ni| 1 ...,|n m -i|<JV*=l j=l 

Applying the hypothesis (ii) we obtain 

||7a,r(/i, • • • , /n-l)|| LP ^( R ) ^ 

n-l 

^limsupe-^IJU^OII^W- 

£— >0 . , 

1 = 1 

Approximating integrals by Riemann sums again and using the scaling hypothesis 1/pi + 
. . . + 1/pn-i = 1/Pn we obtain (i) as desired. 

Now we show that (ii) implies (hi). Define M := niaxj^™^ \a i: j\ : 1 < i < n — 1}. 
Let fi G L Pi (X), let L > 1 be an arbitrary number and let x G X also be arbitrary. By 
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applying the hypothesis (ii) to the functions 0, defined by 0j(Z) = fi(S l x) if |/| < (M + 1)L 
and 4>i(l) = otherwise, we get that 

n-l / 

E - ,/n-i)(^)) K < n E i/W*) 

\1\<L i=l \\1\<L 

with an implicit constant independent on x and L. The quantity T] XL (/!, . . . , f n _i)(x) 
denotes the maximal operator over averages with N < L. Integration with respect to x 
and Holder's inequality imply that 

n-1 

K,X,l(/i> • • ■ , fn-l)\\ LP ' n( x_) < ]^[ ||/i|U^(X)- 

By letting L — > oo we obtain (iii). 

To show that (iii) implies (ii), we specialize (iii) to the finitary dynamical system X = 
Z/NZ with the standard shift Sx :— x + 1 and the uniform probability measure. Letting 
N — > oo (taking advantage of the uniformity of the bounds in (iii) in TV) and renormalizing 
the probability measure to be counting measure (taking advantage of the scaling condition) 
we obtain (ii); we omit the details. ■ 
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